mirror of
https://github.com/Hopiu/linkchecker.git
synced 2026-03-23 01:10:27 +00:00
99 lines
5.3 KiB
ReStructuredText
99 lines
5.3 KiB
ReStructuredText
:github_url: https://github.com/linkchecker/linkchecker/blob/master/doc/src/code/install.rst
|
|
|
|
Code
|
|
====
|
|
|
|
LinkChecker comprises the linkchecker executable and linkcheck package.
|
|
|
|
.. autosummary::
|
|
:recursive:
|
|
:toctree: linkcheck
|
|
|
|
linkcheck
|
|
|
|
.. rubric:: Running
|
|
|
|
linkchecker provides the command-line arguments and reads a list of URLs from
|
|
standard input, reads configuration files, drops privileges if run as root,
|
|
initialises the chosen logger and collects an optional password.
|
|
|
|
Uses :meth:`linkcheck.director.get_aggregate` to obtain an *aggregate* object
|
|
:class:`linkcheck.director.aggregator.Aggregate`
|
|
that includes :class:`linkcheck.cache.urlqueue.UrlQueue`,
|
|
:class:`linkcheck.plugins.PluginManager` and
|
|
:class:`linkcheck.cache.results.ResultCache` objects.
|
|
|
|
Adds URLs in the form of *url_data* objects to the aggregate's *urlqueue* with
|
|
:meth:`linkcheck.cmdline.aggregate_url` which uses
|
|
:meth:`linkcheck.checker.get_url_from` to return a *url_data* object that is an instance
|
|
of one of the :mod:`linkcheck.checker` classes derived from :class:`linkcheck.checker.urlbase.UrlBase`,
|
|
according to the URL scheme.
|
|
|
|
.. graphviz::
|
|
:alt: linkcheck.checker classes
|
|
|
|
digraph "linkcheck.checker classes" {
|
|
charset="utf-8"
|
|
rankdir=BT
|
|
"1" [label="DnsUrl", shape="record", href="../code/linkcheck/linkcheck.checker.dnsurl.html", target="_blank"];
|
|
"2" [label="FileUrl", shape="record", href="../code/linkcheck/linkcheck.checker.fileurl.html", target="_blank"];
|
|
"3" [label="FtpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.ftpurl.html", target="_blank"];
|
|
"4" [label="HttpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.httpurl.html", target="_blank"];
|
|
"5" [label="IgnoreUrl", shape="record", href="../code/linkcheck/linkcheck.checker.ignoreurl.html", target="_blank"];
|
|
"6" [label="InternPatternUrl", shape="record", href="../code/linkcheck/linkcheck.checker.internpaturl.html", target="_blank"];
|
|
"7" [label="ItmsServicesUrl", shape="record", href="../code/linkcheck/linkcheck.checker.itmsservicesurl.html", target="_blank"];
|
|
"8" [label="MailtoUrl", shape="record", href="../code/linkcheck/linkcheck.checker.mailtourl.html", target="_blank"];
|
|
"9" [label="NntpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.nntpurl.html", target="_blank"];
|
|
"10" [label="ProxySupport", shape="record", href="../code/linkcheck/linkcheck.checker.proxysupport.html", target="_blank"];
|
|
"11" [label="TelnetUrl", shape="record", href="../code/linkcheck/linkcheck.checker.telneturl.html", target="_blank"];
|
|
"12" [label="UnknownUrl", shape="record", href="../code/linkcheck/linkcheck.checker.unknownurl.html", target="_blank"];
|
|
"13" [label="UrlBase", shape="record", href="../code/linkcheck/linkcheck.checker.urlbase.html", target="_blank"];
|
|
"1" -> "13" [arrowhead="empty", arrowtail="none"];
|
|
"2" -> "13" [arrowhead="empty", arrowtail="none"];
|
|
"3" -> "6" [arrowhead="empty", arrowtail="none"];
|
|
"3" -> "10" [arrowhead="empty", arrowtail="none"];
|
|
"4" -> "6" [arrowhead="empty", arrowtail="none"];
|
|
"4" -> "10" [arrowhead="empty", arrowtail="none"];
|
|
"5" -> "12" [arrowhead="empty", arrowtail="none"];
|
|
"6" -> "13" [arrowhead="empty", arrowtail="none"];
|
|
"7" -> "13" [arrowhead="empty", arrowtail="none"];
|
|
"8" -> "13" [arrowhead="empty", arrowtail="none"];
|
|
"9" -> "13" [arrowhead="empty", arrowtail="none"];
|
|
"11" -> "13" [arrowhead="empty", arrowtail="none"];
|
|
"12" -> "13" [arrowhead="empty", arrowtail="none"];
|
|
}
|
|
|
|
|
|
Optionally initialises profiling.
|
|
|
|
Starts the checking with :meth:`linkcheck.director.check_urls`, passing the *aggregate*.
|
|
|
|
Finally it counts any errors and exits with the appropriate code.
|
|
|
|
.. rubric:: Checking & Parsing
|
|
|
|
That is:
|
|
|
|
- Checking a link is valid
|
|
- Parsing the document the link points to for new links
|
|
|
|
:meth:`linkcheck.director.check_urls` authenticates with a login form if one is configured
|
|
via :meth:`linkcheck.director.aggregator.Aggregate.visit_loginurl`, starts logging
|
|
with :meth:`linkcheck.director.aggregator.Aggregate.logger.start_log_output`
|
|
and calls :meth:`linkcheck.director.aggregator.Aggregate.start_threads` which instantiates a
|
|
:class:`linkcheck.director.checker.Checker` object with the urlqueue if there is at
|
|
least one thread configured, else it calls
|
|
:meth:`linkcheck.director.checker.check_urls` which loops through the entries in the *urlqueue*.
|
|
|
|
Either way :meth:`linkcheck.director.checker.check_url` tests to see if *url_data* already has a result and
|
|
whether the cache already has a result for that key.
|
|
If not it calls *url_data.check()*,
|
|
which calls *url_data.check_content()* that runs content plugins and returns *do_parse*
|
|
according to *url_data.do_check_content* and :meth:`linkcheck.checker.urlbase.UrlBase.allows_recursion` which
|
|
includes :meth:`linkcheck.checker.urlbase.UrlBase.allows_simple_recursion` that is monitoring the recursion level
|
|
(with :attr:`linkcheck.checker.urlbase.UrlBase.recursion_level`).
|
|
If *do_parse* is True, passes the *url_data* object to :meth:`linkcheck.parser.parse_url` to call a
|
|
`linkcheck.parser.parse_` method according to the document type
|
|
e.g. :meth:`linkcheck.parser.parse_html` for HTML which calls :meth:`linkcheck.htmlutil.linkparse.find_links`
|
|
passing *url_data.get_soup()* and *url_data.add_url*.
|
|
`url_data.add_url` puts the new *url_data* object on the *urlqueue*.
|