Updated documentation.

This commit is contained in:
Bastian Kleineidam 2011-10-19 20:08:27 +02:00
parent 2be106bf3b
commit 5d837ae7be
2 changed files with 34 additions and 26 deletions

View file

@ -27,7 +27,7 @@ img { border: 0; }
<h2>Basic usage</h2>
<p>To check a URL like <code>http://www.example.org/</code> it is enough to
type <code>linkchecker www.example.org/</code> on the command line or
type <code>linkchecker www.example.org</code> on the command line or
type <code>www.example.org</code> in the GUI application. This will check the
complete domain of <code>http://www.example.org</code> recursively. All links
pointing outside of the domain are also checked for validity.</p>
@ -50,24 +50,26 @@ All final HTTP status codes other than 2xx are errors.</p></li>
<li><p>Local files (<code>file:</code>)</p>
<p>A regular, readable file that can be opened is valid. A readable
directory is also valid. All other files, for example device files,
unreadable or non-existing files are errors.</p>
directory is also valid. All other files, for example unreadable,
non-existing or device files are errors.</p>
<p>File contents are checked for recursion.</p></li>
<p>File contents are checked for recursion. If they are parseable
files (for example HTML files), all links in that file will be
checked.</p></li>
<li><p>Mail links (<code>mailto:</code>)</p>
<p>A mailto: link eventually resolves to a list of email addresses.
If one address fails, the whole list will fail.
<p>A mailto: link resolves to a list of email addresses.
If one address fails the whole list will fail.
For each mail address the following things are checked:</p>
<ol>
<li>Check the adress syntax, both of the part before and after
<li>Check the adress syntax, both the part before and after
the @ sign.</li>
<li>Look up the MX DNS records. If no MX record is found,
print an error.</li>
<li>Check if one of the mail hosts accept an SMTP connection.
<li>Check if one of the MX mail hosts accept an SMTP connection.
Check hosts with higher priority first.
If no host accepts SMTP, a warning is printed.</li>
If none of the hosts accept SMTP, a warning is printed.</li>
<li>Try to verify the address with the VRFY command. If there is
an answer, the verified address is printed as an info.</li>
</ol></li>
@ -145,7 +147,8 @@ conditions. The conditions are checked in this order:</p>
<ol>
<li>The URL must be valid.</li>
<li>The URL must be parseable. This currently includes HTML files,
Opera bookmarks files, and directories. If a file type cannot
Opera bookmarks files, directories and on Windows systems MS Word
files if Word is installed on your system. If a file type cannot
be determined (for example it does not have a common HTML file
extension, and the content does not look like HTML), it is assumed
to be non-parseable.</li>
@ -159,8 +162,9 @@ The recursion level is unlimited by default.</li>
the <code>--ignore-url</code> command line option or through the
configuration file.</li>
<li>The Robots Exclusion Protocol must allow links in the URL to be
followed recursively. This is checked by searching for a
"nofollow" directive in the HTML header data.</li>
followed recursively. This is checked by evaluating the servers
robots.txt file and searching for a "nofollow" directive in the
HTML header data.</li>
</ol>
<p>Note that the local and FTP directory recursion reads all files in that
@ -171,7 +175,7 @@ directory, not just a subset like <code>index.htm*</code>.</p>
<p>Each user can edit a configuration with advanced options for
checking or filtering.</p>
<p>On Unix systems the user configuration file is at</p>
<p>On Unix or OS X systems the user configuration file is at</p>
<ul>
<li><code>~/.linkchecker/linkcheckerrc</code></li>

View file

@ -3,7 +3,7 @@
## Basic usage
To check a URL like ``http://www.example.org/`` it is enough to
type ``linkchecker www.example.org/`` on the command line or
type ``linkchecker www.example.org`` on the command line or
type ``www.example.org`` in the GUI application. This will check the
complete domain of ``http://www.example.org`` recursively. All links
pointing outside of the domain are also checked for validity.
@ -26,24 +26,26 @@ checking. All connection check types are described below.
- Local files (``file:``)
A regular, readable file that can be opened is valid. A readable
directory is also valid. All other files, for example device files,
unreadable or non-existing files are errors.
directory is also valid. All other files, for example unreadable,
non-existing or device files are errors.
File contents are checked for recursion.
File contents are checked for recursion. If they are parseable
files (for example HTML files), all links in that file will be
checked.
- Mail links (``mailto:``)
A mailto: link eventually resolves to a list of email addresses.
If one address fails, the whole list will fail.
A mailto: link resolves to a list of email addresses.
If one address fails the whole list will fail.
For each mail address the following things are checked:
1. Check the adress syntax, both of the part before and after
1. Check the adress syntax, both the part before and after
the @ sign.
2. Look up the MX DNS records. If no MX record is found,
print an error.
3. Check if one of the mail hosts accept an SMTP connection.
3. Check if one of the MX mail hosts accept an SMTP connection.
Check hosts with higher priority first.
If no host accepts SMTP, a warning is printed.
If none of the hosts accept SMTP, a warning is printed.
4. Try to verify the address with the VRFY command. If there is
an answer, the verified address is printed as an info.
@ -119,7 +121,8 @@ conditions. The conditions are checked in this order:
1. The URL must be valid.
2. The URL must be parseable. This currently includes HTML files,
Opera bookmarks files, and directories. If a file type cannot
Opera bookmarks files, directories and on Windows systems MS Word
files if Word is installed on your system. If a file type cannot
be determined (for example it does not have a common HTML file
extension, and the content does not look like HTML), it is assumed
to be non-parseable.
@ -133,8 +136,9 @@ conditions. The conditions are checked in this order:
the ``--ignore-url`` command line option or through the
configuration file.
6. The Robots Exclusion Protocol must allow links in the URL to be
followed recursively. This is checked by searching for a
"nofollow" directive in the HTML header data.
followed recursively. This is checked by evaluating the servers
robots.txt file and searching for a "nofollow" directive in the
HTML header data.
Note that the local and FTP directory recursion reads all files in that
directory, not just a subset like ``index.htm*``.
@ -145,7 +149,7 @@ directory, not just a subset like ``index.htm*``.
Each user can edit a configuration with advanced options for
checking or filtering.
On Unix systems the user configuration file is at
On Unix or OS X systems the user configuration file is at
- ``~/.linkchecker/linkcheckerrc``