Update man pages to optimise for both html and man

- Use "LinkChecker User Manual" as the source for both pages.
- .UR/.UE for external links to allow mandoc to create links in html.
- Use Linux man-pages format for cross references e.g.
  .BR linkcheckerrc (5) which are replace in the html by the Makefile.
This commit is contained in:
Chris Mayo 2020-04-24 19:46:30 +01:00
parent 441cda5e15
commit a205a3722b
4 changed files with 355 additions and 268 deletions

View file

@ -1,12 +1,12 @@
.TH LINKCHECKER 1 2010-07-01 "LinkChecker" "LinkChecker commandline usage"
.TH LINKCHECKER 1 2020-04-24 "LinkChecker" "LinkChecker User Manual"
.SH NAME
linkchecker - command line client to check HTML documents and websites for broken links
.
linkchecker \- command line client to check HTML documents and websites for broken links
.SH SYNOPSIS
\fBlinkchecker\fP [\fIoptions\fP] [\fIfile-or-url\fP]...
.
.B linkchecker
.RI [ options ]
.RI [ file-or-url ]...
.SH DESCRIPTION
.LP
.TP 2
LinkChecker features
.IP \(bu
recursive and multithreaded checking,
@ -33,30 +33,30 @@ Antivirus check
.IP \(bu
a command line and web interface
.SH EXAMPLES
.TP 2
The most common use checks the given domain recursively:
\fBlinkchecker http://www.example.com/\fP
.B linkchecker http://www.example.com/
.br
Beware that this checks the whole site which can have thousands of URLs.
Use the \fB\-r\fP option to restrict the recursion depth.
.br
.TP
Don't check URLs with \fB/secret\fP in its name. All other links are checked as usual:
\fBlinkchecker \-\-ignore\-url=/secret mysite.example.com\fP
.br
.B linkchecker \-\-ignore\-url=/secret mysite.example.com
.TP
Checking a local HTML file on Unix:
\fBlinkchecker ../bla.html\fP
.br
.B linkchecker ../bla.html
.TP
Checking a local HTML file on Windows:
\fBlinkchecker c:\\temp\\test.html\fP
.br
.B linkchecker c:\\temp\\test.html
.TP
You can skip the \fBhttp://\fP url part if the domain starts with \fBwww.\fP:
\fBlinkchecker www.example.com\fP
.br
.B linkchecker www.example.com
.TP
You can skip the \fBftp://\fP url part if the domain starts with \fBftp.\fP:
\fBlinkchecker \-r0 ftp.example.com\fP
.br
.B linkchecker \-r0 ftp.example.com
.TP
Generate a sitemap graph and convert it with the graphviz dot utility:
\fBlinkchecker \-odot \-v www.example.com | dot \-Tps > sitemap.ps\fP
.
.B linkchecker \-odot \-v www.example.com | dot \-Tps > sitemap.ps
.SH OPTIONS
.SS General options
.TP
@ -99,7 +99,8 @@ Output to a file \fBlinkchecker\-out.\fP\fITYPE\fP,
The \fIENCODING\fP specifies the output encoding, the default is
that of your locale.
Valid encodings are listed at
\fBhttp://docs.python.org/library/\:codecs.html#standard-encodings\fP.
.UR http://docs.python.org/library/codecs.html#standard-encodings
.UE .
.br
The \fIFILENAME\fP and \fIENCODING\fP parts of the \fBnone\fP output type
will be ignored, else if the file already exists, it will be overwritten.
@ -126,7 +127,8 @@ below.
.br
The \fIENCODING\fP specifies the output encoding, the default is
that of your locale. Valid encodings are listed at
\fBhttp://docs.python.org/library/\:codecs.html#standard-encodings\fP.
.UR http://docs.python.org/library/codecs.html#standard-encodings
.UE .
.TP
\fB\-q\fP, \fB\-\-quiet\fP
Quiet operation, an alias for \fB\-o none\fP.
@ -203,7 +205,9 @@ version of LinkChecker.
.SH "CONFIGURATION FILES"
Configuration files can specify all options above. They can also
specify some options that cannot be set on the command line.
See \fBlinkcheckerrc\fP(5) for more info.
See
.BR linkcheckerrc (5)
for more info.
.SH OUTPUT TYPES
Note that by default only errors and warnings are logged.
@ -236,7 +240,8 @@ Log check result as machine-readable XML.
.TP
\fBsitemap\fP
Log check result as an XML sitemap whose protocol is documented at
\fBhttp://www.sitemaps.org/protocol.html\fP.
.UR http://www.sitemaps.org/protocol.html
.UE .
.TP
\fBsql\fP
Log check result as SQL script with INSERT commands. An example
@ -252,7 +257,10 @@ Logs nothing. Suitable for debugging or checking the exit code.
.
.SH REGULAR EXPRESSIONS
LinkChecker accepts Python regular expressions.
See \fBhttp://docs.python.org/\:howto/regex.html\fP for an introduction.
See
.UR http://docs.python.org/howto/regex.html
.UE
for an introduction.
An addition is that a leading exclamation mark negates the regular
expression.
@ -276,15 +284,15 @@ Multiple entries are separated by a blank line.
The example below will send two cookies to all URLs starting with
\fBhttp://example.com/hello/\fP and one to all URLs starting
with \fBhttps://example.org/\fP:
Host: example.com
Path: /hello
Set-cookie: ID="smee"
Set-cookie: spam="egg"
Host: example.org
Set-cookie: baggage="elitist"; comment="hologram"
.EX
Host: example.com
Path: /hello
Set-cookie: ID="smee"
Set-cookie: spam="egg"
.PP
Host: example.org
Set-cookie: baggage="elitist"; comment="hologram"
.EE
.SH PROXY SUPPORT
To use a proxy on Unix or Windows set the $http_proxy, $https_proxy or $ftp_proxy
environment variables to the proxy URL. The URL should be of the form
@ -292,29 +300,27 @@ environment variables to the proxy URL. The URL should be of the form
LinkChecker also detects manual proxy settings of Internet Explorer under
Windows systems, and gconf or KDE on Linux systems.
On a Mac use the Internet Config to select a proxy.
.
.PP
You can also set a comma-separated domain list in the $no_proxy environment
variables to ignore any proxy settings for these domains.
.
.TP
Setting a HTTP proxy on Unix for example looks like this:
export http_proxy="http://proxy.example.com:8080"
.B
export http_proxy="http://proxy.example.com:8080"
.TP
Proxy authentication is also supported:
export http_proxy="http://user1:mypass@proxy.example.org:8081"
.B
export http_proxy="http://user1:mypass@proxy.example.org:8081"
.TP
Setting a proxy on the Windows command prompt:
set http_proxy=http://proxy.example.com:8080
.B
set http_proxy=http://proxy.example.com:8080
.SH PERFORMED CHECKS
All URLs have to pass a preliminary syntax test. Minor quoting
mistakes will issue a warning, all other invalid syntax issues
are errors.
After the syntax check passes, the URL is queued for connection
checking. All connection check types are described below.
.
.TP
HTTP links (\fBhttp:\fP, \fBhttps:\fP)
After connecting to the given HTTP server the given path
@ -322,75 +328,74 @@ or query is requested. All redirections are followed, and
if user/password is given it will be used as authorization
when necessary.
All final HTTP status codes other than 2xx are errors.
.
.IP
HTML page contents are checked for recursion.
.TP
Local files (\fBfile:\fP)
A regular, readable file that can be opened is valid. A readable
directory is also valid. All other files, for example device files,
unreadable or non-existing files are errors.
.
.IP
HTML or other parseable file contents are checked for recursion.
.TP
Mail links (\fBmailto:\fP)
A mailto: link eventually resolves to a list of email addresses.
If one address fails, the whole list will fail.
For each mail address we check the following things:
.
1) Check the adress syntax, both of the part before and after
the @ sign.
2) Look up the MX DNS records. If we found no MX record,
print an error.
3) Check if one of the mail hosts accept an SMTP connection.
Check hosts with higher priority first.
If no host accepts SMTP, we print a warning.
4) Try to verify the address with the VRFY command. If we got
an answer, print the verified address as an info.
.br
1) Check the adress syntax, both of the part before and after the @ sign.
.br
2) Look up the MX DNS records. If we found no MX record, print an error.
.br
3) Check if one of the mail hosts accept an SMTP connection.
Check hosts with higher priority first.
If no host accepts SMTP, we print a warning.
.br
4) Try to verify the address with the VRFY command. If we got an answer,
print the verified address as an info.
.TP
FTP links (\fBftp:\fP)
For FTP links we do:
1) connect to the specified host
2) try to login with the given user and password. The default
user is ``anonymous``, the default password is ``anonymous@``.
3) try to change to the given directory
4) list the file with the NLST command
For FTP links we do:
.br
1) connect to the specified host
.br
2) try to login with the given user and password. The default
user is ``anonymous``, the default password is ``anonymous@``.
.br
3) try to change to the given directory
.br
4) list the file with the NLST command
.TP
Telnet links (``telnet:``)
We try to connect and if user/password are given, login to the
given telnet server.
We try to connect and if user/password are given, login to the
given telnet server.
.TP
NNTP links (``news:``, ``snews:``, ``nntp``)
We try to connect to the given NNTP server. If a news group or
article is specified, try to request it from the server.
We try to connect to the given NNTP server. If a news group or
article is specified, try to request it from the server.
.TP
Unsupported links (``javascript:``, etc.)
An unsupported link will only print a warning. No further checking
will be made.
The complete list of recognized, but unsupported links can be found
in the \fBlinkcheck/checker/unknownurl.py\fP source file.
The most prominent of them should be JavaScript links.
An unsupported link will only print a warning. No further checking
will be made.
.IP
The complete list of recognized, but unsupported links can be found
in the \fBlinkcheck/checker/unknownurl.py\fP source file.
The most prominent of them should be JavaScript links.
.SH PLUGINS
There are two plugin types: connection and content plugins.
.
Connection plugins are run after a successful connection to the
URL host.
.
Content plugins are run if the URL type has content
(mailto: URLs have no content for example) and if the check is not
forbidden (ie. by HTTP robots.txt).
.
.PP
See \fBlinkchecker \-\-list\-plugins\fP for a list of plugins and
their documentation. All plugins are enabled via the \fBlinkcheckerrc\fP(5)
their documentation. All plugins are enabled via the
.BR linkcheckerrc (5)
configuration file.
.SH RECURSION
@ -455,11 +460,11 @@ same as the host of the user browsing your pages.
.
.SH RETURN VALUE
The return value is 2 when
.IP \(bu
.IP \(bu 2
a program error occurred.
.PP
The return value is 1 when
.IP \(bu
.IP \(bu 2
invalid links were found or
.IP \(bu
link warnings were found and warnings are enabled
@ -478,12 +483,16 @@ might slow down the program or even the whole system.
.br
\fBlinkchecker\-out.\fP\fITYPE\fP - default logger file output name
.br
\fBhttp://docs.python.org/library/codecs.html#standard-encodings\fP - valid output encodings
.UR http://docs.python.org/library/codecs.html#standard-encodings
.UE
\- valid output encodings
.br
\fBhttp://docs.python.org/howto/regex.html\fP - regular expression documentation
.UR http://docs.python.org/howto/regex.html
.UE
\- regular expression documentation
.SH "SEE ALSO"
\fBlinkcheckerrc\fP(5)
.BR linkcheckerrc (5)
.
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>

View file

@ -1,4 +1,4 @@
.TH linkcheckerrc 5 2007-11-30 "LinkChecker"
.TH LINKCHECKERRC 5 2020-04-24 "LinkChecker" "LinkChecker User Manual"
.SH NAME
linkcheckerrc - configuration file for LinkChecker
.
@ -13,7 +13,8 @@ The default file location is \fB~/.linkchecker/linkcheckerrc\fP on Unix,
.TP
\fBcookiefile=\fP\fIfilename\fP
Read a file with initial cookie data. The cookie data
format is explained in linkchecker(1).
format is explained in
.BR linkchecker (1).
.br
Command line option: \fB\-\-cookiefile\fP
.TP
@ -188,7 +189,8 @@ below.
.br
The \fIENCODING\fP specifies the output encoding, the default is
that of your locale. Valid encodings are listed at
\fBhttp://docs.python.org/library/codecs.html#standard-encodings\fP.
.UR http://docs.python.org/library/codecs.html#standard-encodings
.UE .
.br
Command line option: \fB\-\-output\fP
.TP
@ -228,7 +230,8 @@ Command line option: none
.TP
\fBencoding=\fP\fISTRING\fP
Valid encodings are listed in
\fBhttp://docs.python.org/library/codecs.html#standard-encodings\fP.
.UR http://docs.python.org/library/codecs.html#standard-encodings
.UE .
.br
Default encoding is \fBiso\-8859\-15\fP.
.TP
@ -404,42 +407,47 @@ priority for the first URL is 1.0, for all child URLs 0.5.
How frequently pages are changing.
.
.SH "LOGGER PARTS"
\fBall\fP (for all parts)
\fBid\fP (a unique ID for each logentry)
\fBrealurl\fP (the full url link)
\fBresult\fP (valid or invalid, with messages)
\fBextern\fP (1 or 0, only in some logger types reported)
\fBbase\fP (base href=...)
\fBname\fP (<a href=...>name</a> and <img alt="name">)
\fBparenturl\fP (if any)
\fBinfo\fP (some additional info, e.g. FTP welcome messages)
\fBwarning\fP (warnings)
\fBdltime\fP (download time)
\fBchecktime\fP (check time)
\fBurl\fP (the original url name, can be relative)
\fBintro\fP (the blurb at the beginning, "starting at ...")
\fBoutro\fP (the blurb at the end, "found x errors ...")
.TS
nokeep, tab(@);
ll.
\fBall\fP@(for all parts)
\fBid\fP@(a unique ID for each logentry)
\fBrealurl\fP@(the full url link)
\fBresult\fP@(valid or invalid, with messages)
\fBextern\fP@(1 or 0, only in some logger types reported)
\fBbase\fP@(base href=...)
\fBname\fP@(<a href=...>name</a> and <img alt="name">)
\fBparenturl\fP@(if any)
\fBinfo\fP@(some additional info, e.g. FTP welcome messages)
\fBwarning\fP@(warnings)
\fBdltime\fP@(download time)
\fBchecktime\fP@(check time)
\fBurl\fP@(the original url name, can be relative)
\fBintro\fP@(the blurb at the beginning, "starting at ...")
\fBoutro\fP@(the blurb at the end, "found x errors ...")
.TE
.SH MULTILINE
Some option values can span multiple lines. Each line has to be indented
for that to work. Lines starting with a hash (\fB#\fP) will be ignored,
though they must still be indented.
ignore=
lconline
bookmark
# a comment
^mailto:
.
.EX
ignore=
lconline
bookmark
# a comment
^mailto:
.EE
.SH EXAMPLE
[output]
log=html
[checking]
threads=5
[filtering]
ignorewarnings=http-moved-permanent
.EX
[output]
log=html
.PP
[checking]
threads=5
.PP
[filtering]
ignorewarnings=http-moved-permanent
.EE
.SH PLUGINS
All plugins have a separate section. If the section
appears in the configuration file the plugin is enabled.
@ -475,7 +483,9 @@ Configures the expiration warning time in days.
.SS \fB[HtmlSyntaxCheck]\fP
Check the syntax of HTML pages with the online W3C HTML validator.
See http://validator.w3.org/docs/api.html.
See
.UR http://validator.w3.org/docs/api.html
.UE .
.SS \fB[HttpHeaderInfo]\fP
Print HTTP headers in URL info.
@ -486,7 +496,9 @@ to display all HTTP headers that start with "X-".
.SS \fB[CssSyntaxCheck]\fP
Check the syntax of HTML pages with the online W3C CSS validator.
See http://jigsaw.w3.org/css-validator/manual.html#expert.
See
.UR http://jigsaw.w3.org/css-validator/manual.html#expert
.UE .
.SS \fB[VirusCheck]\fP
Checks the page content for virus infections with clamav.
@ -551,7 +563,7 @@ The IP is obfuscated.
The URL contains leading or trailing whitespace.
.SH "SEE ALSO"
linkchecker(1)
.BR linkchecker (1)
.
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>

View file

@ -20,7 +20,7 @@
<table class="head">
<tr>
<td class="head-ltitle">LINKCHECKER(1)</td>
<td class="head-vol">LinkChecker commandline usage</td>
<td class="head-vol">LinkChecker User Manual</td>
<td class="head-rtitle">LINKCHECKER(1)</td>
</tr>
</table>
@ -36,7 +36,10 @@ linkchecker - command line client to check HTML documents and websites for
</section>
<section class="Sh">
<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1>
LinkChecker features
<dl class="Bl-tag">
<dt>LinkChecker features</dt>
<dd></dd>
</dl>
<ul class="Bl-bullet">
<li>recursive and multithreaded checking,</li>
<li>output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph
@ -56,30 +59,30 @@ LinkChecker features
</section>
<section class="Sh">
<h1 class="Sh" id="EXAMPLES"><a class="permalink" href="#EXAMPLES">EXAMPLES</a></h1>
The most common use checks the given domain recursively:
<b>linkchecker http://www.example.com/</b>
<br/>
Beware that this checks the whole site which can have thousands of URLs. Use the
<b>-r</b> option to restrict the recursion depth.
<br/>
Don't check URLs with <b>/secret</b> in its name. All other links are checked as
usual:
<b>linkchecker --ignore-url=/secret mysite.example.com</b>
<br/>
Checking a local HTML file on Unix:
<b>linkchecker ../bla.html</b>
<br/>
Checking a local HTML file on Windows:
<b>linkchecker c:\temp\test.html</b>
<br/>
You can skip the <b>http://</b> url part if the domain starts with <b>www.</b>:
<b>linkchecker www.example.com</b>
<br/>
You can skip the <b>ftp://</b> url part if the domain starts with <b>ftp.</b>:
<b>linkchecker -r0 ftp.example.com</b>
<br/>
Generate a sitemap graph and convert it with the graphviz dot utility:
<b>linkchecker -odot -v www.example.com | dot -Tps &gt; sitemap.ps</b>
<dl class="Bl-tag">
<dt>The most common use checks the given domain recursively:</dt>
<dd><b>linkchecker http://www.example.com/</b>
<br/>
Beware that this checks the whole site which can have thousands of URLs. Use
the <b>-r</b> option to restrict the recursion depth.</dd>
<dt>Don't check URLs with <b>/secret</b> in its name. All other links are
checked as usual:</dt>
<dd><b>linkchecker --ignore-url=/secret mysite.example.com</b></dd>
<dt>Checking a local HTML file on Unix:</dt>
<dd><b>linkchecker ../bla.html</b></dd>
<dt>Checking a local HTML file on Windows:</dt>
<dd><b>linkchecker c:empest.html</b></dd>
<dt>You can skip the <b>http://</b> url part if the domain starts with
<b>www.</b>:</dt>
<dd><b>linkchecker www.example.com</b></dd>
<dt>You can skip the <b>ftp://</b> url part if the domain starts with
<b>ftp.</b>:</dt>
<dd><b>linkchecker -r0 ftp.example.com</b></dd>
<dt>Generate a sitemap graph and convert it with the graphviz dot
utility:</dt>
<dd><b>linkchecker -odot -v www.example.com | dot -Tps &gt;
sitemap.ps</b></dd>
</dl>
</section>
<section class="Sh">
<h1 class="Sh" id="OPTIONS"><a class="permalink" href="#OPTIONS">OPTIONS</a></h1>
@ -120,7 +123,8 @@ Generate a sitemap graph and convert it with the graphviz dot utility:
<b>$HOME/.linkchecker/blacklist</b> for <b>blacklist</b> output, or
<i>FILENAME</i> if specified. The <i>ENCODING</i> specifies the output
encoding, the default is that of your locale. Valid encodings are listed
at <b>http://docs.python.org/library/codecs.html#standard-encodings</b>.
at
<a class="Lk" href="http://docs.python.org/library/codecs.html#standard-encodings">http://docs.python.org/library/codecs.html#standard-encodings</a>.
<br/>
The <i>FILENAME</i> and <i>ENCODING</i> parts of the <b>none</b> output type
will be ignored, else if the file already exists, it will be overwritten.
@ -142,7 +146,7 @@ Generate a sitemap graph and convert it with the graphviz dot utility:
<br/>
The <i>ENCODING</i> specifies the output encoding, the default is that of
your locale. Valid encodings are listed at
<b>http://docs.python.org/library/codecs.html#standard-encodings</b>.</dd>
<a class="Lk" href="http://docs.python.org/library/codecs.html#standard-encodings">http://docs.python.org/library/codecs.html#standard-encodings</a>.</dd>
<dt><b>-q</b>, <b>--quiet</b></dt>
<dd>Quiet operation, an alias for <b>-o none</b>. This is only useful with
<b>-F</b>.</dd>
@ -247,7 +251,7 @@ Note that by default only errors and warnings are logged. You should use the
<dd>Log check result as machine-readable XML.</dd>
<dt><b>sitemap</b></dt>
<dd>Log check result as an XML sitemap whose protocol is documented at
<b>http://www.sitemaps.org/protocol.html</b>.</dd>
<a class="Lk" href="http://www.sitemaps.org/protocol.html">http://www.sitemaps.org/protocol.html</a>.</dd>
<dt><b>sql</b></dt>
<dd>Log check result as SQL script with INSERT commands. An example script to
create the initial SQL table is included as create.sql.</dd>
@ -263,7 +267,8 @@ Note that by default only errors and warnings are logged. You should use the
<h1 class="Sh" id="REGULAR_EXPRESSIONS"><a class="permalink" href="#REGULAR_EXPRESSIONS">REGULAR
EXPRESSIONS</a></h1>
LinkChecker accepts Python regular expressions. See
<b>http://docs.python.org/howto/regex.html</b> for an introduction.
<a class="Lk" href="http://docs.python.org/howto/regex.html">http://docs.python.org/howto/regex.html</a>
for an introduction.
<p class="Pp">An addition is that a leading exclamation mark negates the regular
expression.</p>
</section>
@ -284,15 +289,16 @@ A cookie file contains standard HTTP header (RFC 2616) data with the following
will send two cookies to all URLs starting with
<b>http://example.com/hello/</b> and one to all URLs starting with
<b>https://example.org/</b>:</p>
<p class="Pp">
Host: example.com
Path: /hello
Set-cookie: ID=&quot;smee&quot;
Set-cookie: spam=&quot;egg&quot;</p>
<p class="Pp">
Host: example.org
Set-cookie: baggage=&quot;elitist&quot;; comment=&quot;hologram&quot;</p>
<p class="Pp"></p>
<pre>
Host: example.com
Path: /hello
Set-cookie: ID=&quot;smee&quot;
Set-cookie: spam=&quot;egg&quot;
</pre>
<pre>
Host: example.org
Set-cookie: baggage=&quot;elitist&quot;; comment=&quot;hologram&quot;
</pre>
</section>
<section class="Sh">
<h1 class="Sh" id="PROXY_SUPPORT"><a class="permalink" href="#PROXY_SUPPORT">PROXY
@ -303,18 +309,18 @@ To use a proxy on Unix or Windows set the $http_proxy, $https_proxy or
<b>http://</b>[<i>user</i><b>:</b><i>pass</i><b>@</b>]<i>host</i>[<b>:</b><i>port</i>].
LinkChecker also detects manual proxy settings of Internet Explorer under
Windows systems, and gconf or KDE on Linux systems. On a Mac use the Internet
Config to select a proxy. You can also set a comma-separated domain list in
the $no_proxy environment variables to ignore any proxy settings for these
domains. Setting a HTTP proxy on Unix for example looks like this:
<p class="Pp">
export http_proxy=&quot;http://proxy.example.com:8080&quot;</p>
<p class="Pp">Proxy authentication is also supported:</p>
<p class="Pp">
export http_proxy=&quot;http://user1:mypass@proxy.example.org:8081&quot;</p>
<p class="Pp">Setting a proxy on the Windows command prompt:</p>
<p class="Pp">
set http_proxy=http://proxy.example.com:8080</p>
<p class="Pp"></p>
Config to select a proxy.
<p class="Pp">You can also set a comma-separated domain list in the $no_proxy
environment variables to ignore any proxy settings for these domains.</p>
<dl class="Bl-tag">
<dt>Setting a HTTP proxy on Unix for example looks like this:</dt>
<dd><b>export http_proxy=&quot;http://proxy.example.com:8080&quot;</b></dd>
<dt>Proxy authentication is also supported:</dt>
<dd><b>export
http_proxy=&quot;http://user1:mypass@proxy.example.org:8081&quot;</b></dd>
<dt>Setting a proxy on the Windows command prompt:</dt>
<dd><b>set http_proxy=http://proxy.example.com:8080</b></dd>
</dl>
</section>
<section class="Sh">
<h1 class="Sh" id="PERFORMED_CHECKS"><a class="permalink" href="#PERFORMED_CHECKS">PERFORMED
@ -328,63 +334,71 @@ All URLs have to pass a preliminary syntax test. Minor quoting mistakes will
<dd>After connecting to the given HTTP server the given path or query is
requested. All redirections are followed, and if user/password is given it
will be used as authorization when necessary. All final HTTP status codes
other than 2xx are errors. HTML page contents are checked for
recursion.</dd>
other than 2xx are errors.</dd>
</dl>
<dl class="Bl-tag">
<dt></dt>
<dd>HTML page contents are checked for recursion.</dd>
</dl>
<dl class="Bl-tag">
<dt>Local files (<b>file:</b>)</dt>
<dd>A regular, readable file that can be opened is valid. A readable directory
is also valid. All other files, for example device files, unreadable or
non-existing files are errors. HTML or other parseable file contents are
checked for recursion.</dd>
non-existing files are errors.</dd>
</dl>
<dl class="Bl-tag">
<dt></dt>
<dd>HTML or other parseable file contents are checked for recursion.</dd>
</dl>
<dl class="Bl-tag">
<dt>Mail links (<b>mailto:</b>)</dt>
<dd>A mailto: link eventually resolves to a list of email addresses. If one
address fails, the whole list will fail. For each mail address we check
the following things:
1) Check the adress syntax, both of the part before and after
the @ sign.
2) Look up the MX DNS records. If we found no MX record,
print an error.
3) Check if one of the mail hosts accept an SMTP connection.
Check hosts with higher priority first.
If no host accepts SMTP, we print a warning.
4) Try to verify the address with the VRFY command. If we got
an answer, print the verified address as an info.</dd>
<br/>
1) Check the adress syntax, both of the part before and after the @ sign.
<br/>
2) Look up the MX DNS records. If we found no MX record, print an error.
<br/>
3) Check if one of the mail hosts accept an SMTP connection. Check hosts
with higher priority first. If no host accepts SMTP, we print a warning.
<br/>
4) Try to verify the address with the VRFY command. If we got an answer,
print the verified address as an info.
<p class="Pp"></p>
</dd>
<dt>FTP links (<b>ftp:</b>)</dt>
<dd>
<p class="Pp">
For FTP links we do:</p>
<p class="Pp">
1) connect to the specified host
2) try to login with the given user and password. The default
user is ``anonymous``, the default password is ``anonymous@``.
3) try to change to the given directory
4) list the file with the NLST command</p>
<dd>For FTP links we do:
<br/>
1) connect to the specified host
<br/>
2) try to login with the given user and password. The default user is
``anonymous``, the default password is ``anonymous@``.
<br/>
3) try to change to the given directory
<br/>
4) list the file with the NLST command
<p class="Pp"></p>
</dd>
<dt>Telnet links (``telnet:``)</dt>
<dd>
<p class="Pp">
We try to connect and if user/password are given, login to the
given telnet server.</p>
<dd>We try to connect and if user/password are given, login to the given
telnet server.
<p class="Pp"></p>
</dd>
<dt>NNTP links (``news:``, ``snews:``, ``nntp``)</dt>
<dd>
<p class="Pp">
We try to connect to the given NNTP server. If a news group or
article is specified, try to request it from the server.</p>
<dd>We try to connect to the given NNTP server. If a news group or article is
specified, try to request it from the server.
<p class="Pp"></p>
</dd>
<dt>Unsupported links (``javascript:``, etc.)</dt>
<dd>
<p class="Pp">
An unsupported link will only print a warning. No further checking
will be made.</p>
<p class="Pp">
The complete list of recognized, but unsupported links can be found
in the <b>linkcheck/checker/unknownurl.py</b> source file.
The most prominent of them should be JavaScript links.</p>
<p class="Pp"></p>
</dd>
<dd>An unsupported link will only print a warning. No further checking will be
made.</dd>
</dl>
<dl class="Bl-tag">
<dt></dt>
<dd>The complete list of recognized, but unsupported links can be found in the
<b>linkcheck/checker/unknownurl.py</b> source file. The most prominent of
them should be JavaScript links.</dd>
</dl>
</section>
<section class="Sh">
@ -392,9 +406,10 @@ All URLs have to pass a preliminary syntax test. Minor quoting mistakes will
There are two plugin types: connection and content plugins. Connection plugins
are run after a successful connection to the URL host. Content plugins are run
if the URL type has content (mailto: URLs have no content for example) and if
the check is not forbidden (ie. by HTTP robots.txt). See <b>linkchecker
--list-plugins</b> for a list of plugins and their documentation. All plugins
are enabled via the <a href="../man5/linkcheckerrc.5.html" class="Xr">linkcheckerrc(5)</a> configuration file.
the check is not forbidden (ie. by HTTP robots.txt).
<p class="Pp">See <b>linkchecker --list-plugins</b> for a list of plugins and
their documentation. All plugins are enabled via the <a href="../man5/linkcheckerrc.5.html" class="Xr">linkcheckerrc(5)</a>
configuration file.</p>
<p class="Pp"></p>
</section>
<section class="Sh">
@ -480,11 +495,11 @@ LinkChecker consumes memory for each queued URL to check. With thousands of
<br/>
<b>linkchecker-out.</b><i>TYPE</i> - default logger file output name
<br/>
<b>http://docs.python.org/library/codecs.html#standard-encodings</b> - valid
output encodings
<a class="Lk" href="http://docs.python.org/library/codecs.html#standard-encodings">http://docs.python.org/library/codecs.html#standard-encodings</a>
- valid output encodings
<br/>
<b>http://docs.python.org/howto/regex.html</b> - regular expression
documentation
<a class="Lk" href="http://docs.python.org/howto/regex.html">http://docs.python.org/howto/regex.html</a>
- regular expression documentation
<p class="Pp"></p>
</section>
<section class="Sh">
@ -503,7 +518,7 @@ Copyright &#x00A9; 2000-2014 Bastian Kleineidam
</div>
<table class="foot">
<tr>
<td class="foot-date">2010-07-01</td>
<td class="foot-date">2020-04-24</td>
<td class="foot-os">LinkChecker</td>
</tr>
</table>

View file

@ -14,14 +14,14 @@
code.Nm, code.Fl, code.Cm, code.Ic, code.In, code.Fd, code.Fn,
code.Cd { font-weight: bold; font-family: inherit; }
</style>
<title>linkcheckerrc(5)</title>
<title>LINKCHECKERRC(5)</title>
</head>
<body>
<table class="head">
<tr>
<td class="head-ltitle">linkcheckerrc(5)</td>
<td class="head-vol">File Formats Manual</td>
<td class="head-rtitle">linkcheckerrc(5)</td>
<td class="head-ltitle">LINKCHECKERRC(5)</td>
<td class="head-vol">LinkChecker User Manual</td>
<td class="head-rtitle">LINKCHECKERRC(5)</td>
</tr>
</table>
<div class="manual-text">
@ -44,7 +44,7 @@ The default file location is <b>~/.linkchecker/linkcheckerrc</b> on Unix,
<dl class="Bl-tag">
<dt><b>cookiefile=</b><i>filename</i></dt>
<dd>Read a file with initial cookie data. The cookie data format is explained
in linkchecker(1).
in <a href="../man1/linkchecker.1.html" class="Xr">linkchecker(1)</a>.
<br/>
Command line option: <b>--cookiefile</b></dd>
<dt><b>localwebroot=</b><i>STRING</i></dt>
@ -201,7 +201,7 @@ The default file location is <b>~/.linkchecker/linkcheckerrc</b> on Unix,
<br/>
The <i>ENCODING</i> specifies the output encoding, the default is that of
your locale. Valid encodings are listed at
<b>http://docs.python.org/library/codecs.html#standard-encodings</b>.
<a class="Lk" href="http://docs.python.org/library/codecs.html#standard-encodings">http://docs.python.org/library/codecs.html#standard-encodings</a>.
<br/>
Command line option: <b>--output</b></dd>
<dt><b>quiet=</b>[<b>0</b>|<b>1</b>]</dt>
@ -239,7 +239,7 @@ The default file location is <b>~/.linkchecker/linkcheckerrc</b> on Unix,
Command line option: none</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>Valid encodings are listed in
<b>http://docs.python.org/library/codecs.html#standard-encodings</b>.
<a class="Lk" href="http://docs.python.org/library/codecs.html#standard-encodings">http://docs.python.org/library/codecs.html#standard-encodings</a>.
<br/>
Default encoding is <b>iso-8859-15</b>.</dd>
<dt><i>color*</i></dt>
@ -405,46 +405,97 @@ The default file location is <b>~/.linkchecker/linkcheckerrc</b> on Unix,
<section class="Sh">
<h1 class="Sh" id="LOGGER_PARTS"><a class="permalink" href="#LOGGER_PARTS">LOGGER
PARTS</a></h1>
<b>all</b> (for all parts)
<b>id</b> (a unique ID for each logentry)
<b>realurl</b> (the full url link)
<b>result</b> (valid or invalid, with messages)
<b>extern</b> (1 or 0, only in some logger types reported)
<b>base</b> (base href=...)
<b>name</b> (&lt;a href=...&gt;name&lt;/a&gt; and &lt;img
alt=&quot;name&quot;&gt;)
<b>parenturl</b> (if any)
<b>info</b> (some additional info, e.g. FTP welcome messages)
<b>warning</b> (warnings)
<b>dltime</b> (download time)
<b>checktime</b> (check time)
<b>url</b> (the original url name, can be relative)
<b>intro</b> (the blurb at the beginning, &quot;starting at ...&quot;)
<b>outro</b> (the blurb at the end, &quot;found x errors ...&quot;)
<table class="tbl">
<tr>
<td><b>all</b></td>
<td>(for all parts)</td>
</tr>
<tr>
<td><b>id</b></td>
<td>(a unique ID for each logentry)</td>
</tr>
<tr>
<td><b>realurl</b></td>
<td>(the full url link)</td>
</tr>
<tr>
<td><b>result</b></td>
<td>(valid or invalid, with messages)</td>
</tr>
<tr>
<td><b>extern</b></td>
<td>(1 or 0, only in some logger types reported)</td>
</tr>
<tr>
<td><b>base</b></td>
<td>(base href=...)</td>
</tr>
<tr>
<td><b>name</b></td>
<td>(&lt;a href=...&gt;name&lt;/a&gt; and &lt;img
alt=&quot;name&quot;&gt;)</td>
</tr>
<tr>
<td><b>parenturl</b></td>
<td>(if any)</td>
</tr>
<tr>
<td><b>info</b></td>
<td>(some additional info, e.g. FTP welcome messages)</td>
</tr>
<tr>
<td><b>warning</b></td>
<td>(warnings)</td>
</tr>
<tr>
<td><b>dltime</b></td>
<td>(download time)</td>
</tr>
<tr>
<td><b>checktime</b></td>
<td>(check time)</td>
</tr>
<tr>
<td><b>url</b></td>
<td>(the original url name, can be relative)</td>
</tr>
<tr>
<td><b>intro</b></td>
<td>(the blurb at the beginning, &quot;starting at ...&quot;)</td>
</tr>
<tr>
<td><b>outro</b></td>
<td>(the blurb at the end, &quot;found x errors ...&quot;)</td>
</tr>
</table>
</section>
<section class="Sh">
<h1 class="Sh" id="MULTILINE"><a class="permalink" href="#MULTILINE">MULTILINE</a></h1>
Some option values can span multiple lines. Each line has to be indented for
that to work. Lines starting with a hash (<b>#</b>) will be ignored, though
they must still be indented.
<p class="Pp">
ignore=
lconline
bookmark
# a comment
^mailto:</p>
<pre>
ignore=
lconline
bookmark
# a comment
^mailto:
</pre>
</section>
<section class="Sh">
<h1 class="Sh" id="EXAMPLE"><a class="permalink" href="#EXAMPLE">EXAMPLE</a></h1>
[output]
log=html
<p class="Pp">
[checking]
threads=5</p>
<p class="Pp">
[filtering]
ignorewarnings=http-moved-permanent</p>
<p class="Pp"></p>
<pre>
[output]
log=html
</pre>
<pre>
[checking]
threads=5
</pre>
<pre>
[filtering]
ignorewarnings=http-moved-permanent
</pre>
</section>
<section class="Sh">
<h1 class="Sh" id="PLUGINS"><a class="permalink" href="#PLUGINS">PLUGINS</a></h1>
@ -493,7 +544,7 @@ Check SSL certificate expiration date. Only internal https: links will be
<section class="Ss">
<h2 class="Ss" id="_fB_HtmlSyntaxCheck__fP"><a class="permalink" href="#_fB_HtmlSyntaxCheck__fP"><b>[HtmlSyntaxCheck]</b></a></h2>
Check the syntax of HTML pages with the online W3C HTML validator. See
http://validator.w3.org/docs/api.html.
<a class="Lk" href="http://validator.w3.org/docs/api.html">http://validator.w3.org/docs/api.html</a>.
<p class="Pp"></p>
</section>
<section class="Ss">
@ -510,7 +561,7 @@ Print HTTP headers in URL info.
<section class="Ss">
<h2 class="Ss" id="_fB_CssSyntaxCheck__fP"><a class="permalink" href="#_fB_CssSyntaxCheck__fP"><b>[CssSyntaxCheck]</b></a></h2>
Check the syntax of HTML pages with the online W3C CSS validator. See
http://jigsaw.w3.org/css-validator/manual.html#expert.
<a class="Lk" href="http://jigsaw.w3.org/css-validator/manual.html#expert">http://jigsaw.w3.org/css-validator/manual.html#expert</a>.
<p class="Pp"></p>
</section>
<section class="Ss">
@ -575,7 +626,7 @@ The following warnings are recognized in the 'ignorewarnings' config file entry:
<section class="Sh">
<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE
ALSO</a></h1>
linkchecker(1)
<a href="../man1/linkchecker.1.html" class="Xr">linkchecker(1)</a>
</section>
<section class="Sh">
<h1 class="Sh" id="AUTHOR"><a class="permalink" href="#AUTHOR">AUTHOR</a></h1>
@ -588,7 +639,7 @@ Copyright &#x00A9; 2000-2014 Bastian Kleineidam
</div>
<table class="foot">
<tr>
<td class="foot-date">2007-11-30</td>
<td class="foot-date">2020-04-24</td>
<td class="foot-os">LinkChecker</td>
</tr>
</table>