linkchecker/doc/src/code/index.rst
Chris Mayo d1834f0619 Generate html documentation with Sphinx
Create code documentation:
make -C doc code

Create html:
make -C doc html
2020-08-15 17:02:38 +01:00

99 lines
5.3 KiB
ReStructuredText

:github_url: https://github.com/linkchecker/linkchecker/blob/master/doc/src/code/install.rst
Code
====
LinkChecker comprises the linkchecker executable and linkcheck package.
.. autosummary::
:recursive:
:toctree: linkcheck
linkcheck
.. rubric:: Running
linkchecker provides the command-line arguments and reads a list of URLs from
standard input, reads configuration files, drops privileges if run as root,
initialises the chosen logger and collects an optional password.
Uses :meth:`linkcheck.director.get_aggregate` to obtain an *aggregate* object
:class:`linkcheck.director.aggregator.Aggregate`
that includes :class:`linkcheck.cache.urlqueue.UrlQueue`,
:class:`linkcheck.plugins.PluginManager` and
:class:`linkcheck.cache.results.ResultCache` objects.
Adds URLs in the form of *url_data* objects to the aggregate's *urlqueue* with
:meth:`linkcheck.cmdline.aggregate_url` which uses
:meth:`linkcheck.checker.get_url_from` to return a *url_data* object that is an instance
of one of the :mod:`linkcheck.checker` classes derived from :class:`linkcheck.checker.urlbase.UrlBase`,
according to the URL scheme.
.. graphviz::
:alt: linkcheck.checker classes
digraph "linkcheck.checker classes" {
charset="utf-8"
rankdir=BT
"1" [label="DnsUrl", shape="record", href="../code/linkcheck/linkcheck.checker.dnsurl.html", target="_blank"];
"2" [label="FileUrl", shape="record", href="../code/linkcheck/linkcheck.checker.fileurl.html", target="_blank"];
"3" [label="FtpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.ftpurl.html", target="_blank"];
"4" [label="HttpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.httpurl.html", target="_blank"];
"5" [label="IgnoreUrl", shape="record", href="../code/linkcheck/linkcheck.checker.ignoreurl.html", target="_blank"];
"6" [label="InternPatternUrl", shape="record", href="../code/linkcheck/linkcheck.checker.internpaturl.html", target="_blank"];
"7" [label="ItmsServicesUrl", shape="record", href="../code/linkcheck/linkcheck.checker.itmsservicesurl.html", target="_blank"];
"8" [label="MailtoUrl", shape="record", href="../code/linkcheck/linkcheck.checker.mailtourl.html", target="_blank"];
"9" [label="NntpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.nntpurl.html", target="_blank"];
"10" [label="ProxySupport", shape="record", href="../code/linkcheck/linkcheck.checker.proxysupport.html", target="_blank"];
"11" [label="TelnetUrl", shape="record", href="../code/linkcheck/linkcheck.checker.telneturl.html", target="_blank"];
"12" [label="UnknownUrl", shape="record", href="../code/linkcheck/linkcheck.checker.unknownurl.html", target="_blank"];
"13" [label="UrlBase", shape="record", href="../code/linkcheck/linkcheck.checker.urlbase.html", target="_blank"];
"1" -> "13" [arrowhead="empty", arrowtail="none"];
"2" -> "13" [arrowhead="empty", arrowtail="none"];
"3" -> "6" [arrowhead="empty", arrowtail="none"];
"3" -> "10" [arrowhead="empty", arrowtail="none"];
"4" -> "6" [arrowhead="empty", arrowtail="none"];
"4" -> "10" [arrowhead="empty", arrowtail="none"];
"5" -> "12" [arrowhead="empty", arrowtail="none"];
"6" -> "13" [arrowhead="empty", arrowtail="none"];
"7" -> "13" [arrowhead="empty", arrowtail="none"];
"8" -> "13" [arrowhead="empty", arrowtail="none"];
"9" -> "13" [arrowhead="empty", arrowtail="none"];
"11" -> "13" [arrowhead="empty", arrowtail="none"];
"12" -> "13" [arrowhead="empty", arrowtail="none"];
}
Optionally initialises profiling.
Starts the checking with :meth:`linkcheck.director.check_urls`, passing the *aggregate*.
Finally it counts any errors and exits with the appropriate code.
.. rubric:: Checking & Parsing
That is:
- Checking a link is valid
- Parsing the document the link points to for new links
:meth:`linkcheck.director.check_urls` authenticates with a login form if one is configured
via :meth:`linkcheck.director.aggregator.Aggregate.visit_loginurl`, starts logging
with :meth:`linkcheck.director.aggregator.Aggregate.logger.start_log_output`
and calls :meth:`linkcheck.director.aggregator.Aggregate.start_threads` which instantiates a
:class:`linkcheck.director.checker.Checker` object with the urlqueue if there is at
least one thread configured, else it calls
:meth:`linkcheck.director.checker.check_urls` which loops through the entries in the *urlqueue*.
Either way :meth:`linkcheck.director.checker.check_url` tests to see if *url_data* already has a result and
whether the cache already has a result for that key.
If not it calls *url_data.check()*,
which calls *url_data.check_content()* that runs content plugins and returns *do_parse*
according to *url_data.do_check_content* and :meth:`linkcheck.checker.urlbase.UrlBase.allows_recursion` which
includes :meth:`linkcheck.checker.urlbase.UrlBase.allows_simple_recursion` that is monitoring the recursion level
(with :attr:`linkcheck.checker.urlbase.UrlBase.recursion_level`).
If *do_parse* is True, passes the *url_data* object to :meth:`linkcheck.parser.parse_url` to call a
`linkcheck.parser.parse_` method according to the document type
e.g. :meth:`linkcheck.parser.parse_html` for HTML which calls :meth:`linkcheck.htmlutil.linkparse.find_links`
passing *url_data.get_soup()* and *url_data.add_url*.
`url_data.add_url` puts the new *url_data* object on the *urlqueue*.