Updated HTML documentation

2026-05-08 22:54:51 +00:00 · 2009-08-19 20:41:17 +02:00 · 2009-08-19 20:41:17 +02:00 · fb091c377f
commit fb091c377f
parent a92e13826b
6 changed files with 280 additions and 160 deletions
--- a/doc/html/Makefile
+++ b/doc/html/Makefile
@ -6,7 +6,10 @@ all:	$(HELPFILES)
 clean:
 	-rm -f *.qhc *.qch

-%.qhc:	%.qhcp lcdoc.qhp
+%.html:	%.txt html.header html.footer
+	(cat html.header; markdown2 $<; cat html.footer) > index.html
+
+%.qhc:	%.qhcp lcdoc.qhp index.html
 	qcollectiongenerator $< -o $@

 favicon.ico:	favicon32x32.png favicon16x16.png
--- a/doc/html/html.footer
+++ b/doc/html/html.footer
@ -0,0 +1,5 @@
+    <div class="footer">
+      &copy; Copyright 2009, Bastian Kleineidam.
+    </div>
+  </body>
+</html>
--- a/doc/html/html.header
+++ b/doc/html/html.header
@ -0,0 +1,24 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    
+    <title>Check websites for broken links</title>
+    <link rel="stylesheet" href="sphinxdoc.css" type="text/css" />
+    <link rel="stylesheet" href="pygments.css" type="text/css" />
+<style type="text/css">
+img { border: 0; }
+</style>
+
+  </head>
+  <body>
+<div style="background-color: white; text-align: left; padding: 10px 10px 15px 15px">
+<table border="0"><tr>
+ <td><img
+  src="logo64x64.png" border="0" alt="LinkChecker"/></td>
+ <td><h1>LinkChecker</h1></td>
+</tr></table>
+</div>
+
--- a/doc/html/index.html
+++ b/doc/html/index.html
@ -8,7 +8,6 @@
    <title>Check websites for broken links</title>
    <link rel="stylesheet" href="sphinxdoc.css" type="text/css" />
    <link rel="stylesheet" href="pygments.css" type="text/css" />
-    <link rel="top" title="LinkChecker" href="" />
 <style type="text/css">
 img { border: 0; }
 </style>
@ -16,206 +15,149 @@ img { border: 0; }
  </head>
  <body>
 <div style="background-color: white; text-align: left; padding: 10px 10px 15px 15px">
-
 <table border="0"><tr>
- <td><a href=""><img
-  src="logo64x64.png" border="0" alt="LinkChecker"/></a></td>
+ <td><img
+  src="logo64x64.png" border="0" alt="LinkChecker"/></td>
 <td><h1>LinkChecker</h1></td>
 </tr></table>
 </div>

-    <div class="document">
-      <div class="documentwrapper">
-          <div class="body">
-            
-  <div class="section" id="check-websites-for-broken-links">
-<p>To check a URL like <tt class="docutils literal"><span class="pre">http://www.myhomepage.org/</span></tt> it is enough to
-execute <tt class="docutils literal"><span class="pre">linkchecker</span> <span class="pre">http://www.myhomepage.org/</span></tt>. This will check the
+<h1>Documentation</h1>
+
+<h2>Basic usage</h2>
+
+<p>To check a URL like <code>http://www.myhomepage.org/</code> it is enough to
+execute <code>linkchecker http://www.myhomepage.org/</code>. This will check the
 complete domain of www.myhomepage.org recursively. All links pointing
 outside of the domain are also checked for validity.</p>
-</div>
-<div class="section" id="performed-checks">
+
 <h2>Performed checks</h2>
+
 <p>All URLs have to pass a preliminary syntax test. Minor quoting
 mistakes will issue a warning, all other invalid syntax issues
 are errors.
 After the syntax check passes, the URL is queued for connection
 checking. All connection check types are described below.</p>
+
 <ul>
-<li><p class="first">HTTP links (<tt class="docutils literal"><span class="pre">http:</span></tt>, <tt class="docutils literal"><span class="pre">https:</span></tt>)</p>
+<li><p>HTTP links (<code>http:</code>, <code>https:</code>)</p>
+
 <p>After connecting to the given HTTP server the given path
 or query is requested. All redirections are followed, and
 if user/password is given it will be used as authorization
 when necessary.
 Permanently moved pages issue a warning.
-All final HTTP status codes other than 2xx are errors.</p>
-</li>
-<li><p class="first">Local files (<tt class="docutils literal"><span class="pre">file:</span></tt>)</p>
+All final HTTP status codes other than 2xx are errors.</p></li>
+<li><p>Local files (<code>file:</code>)</p>
+
 <p>A regular, readable file that can be opened is valid. A readable
 directory is also valid. All other files, for example device files,
 unreadable or non-existing files are errors.</p>
-<p>File contents are checked for recursion.</p>
-</li>
-<li><p class="first">Mail links (<tt class="docutils literal"><span class="pre">mailto:</span></tt>)</p>
+
+<p>File contents are checked for recursion.</p></li>
+<li><p>Mail links (<code>mailto:</code>)</p>
+
 <p>A mailto: link eventually resolves to a list of email addresses.
 If one address fails, the whole list will fail.
 For each mail address we check the following things:</p>
-<ol class="arabic simple">
-<li>Check the adress syntax, both of the part before and after
-the &#64; sign.</li>
-<li>Look up the MX DNS records. If we found no MX record,
-print an error.</li>
-<li>Check if one of the mail hosts accept an SMTP connection.
-Check hosts with higher priority first.
-If no host accepts SMTP, we print a warning.</li>
-<li>Try to verify the address with the VRFY command. If we got
-an answer, print the verified address as an info.</li>
-</ol>
-</li>
-<li><p class="first">FTP links (<tt class="docutils literal"><span class="pre">ftp:</span></tt>)</p>
+
+<p>1) Check the adress syntax, both of the part before and after
+ the @ sign.
+2) Look up the MX DNS records. If we found no MX record,
+ print an error.
+3) Check if one of the mail hosts accept an SMTP connection.
+ Check hosts with higher priority first.
+ If no host accepts SMTP, we print a warning.
+4) Try to verify the address with the VRFY command. If we got
+ an answer, print the verified address as an info.</p></li>
+<li><p>FTP links (<code>ftp:</code>)</p>
+
 <p>For FTP links we do:</p>
-<ol class="arabic simple">
-<li>connect to the specified host</li>
-<li>try to login with the given user and password. The default
-user is <tt class="docutils literal"><span class="pre">anonymous</span></tt>, the default password is <tt class="docutils literal"><span class="pre">anonymous&#64;</span></tt>.</li>
-<li>try to change to the given directory</li>
-<li>list the file with the NLST command</li>
-</ol>
-</li>
-<li><p class="first">Telnet links (<tt class="docutils literal"><span class="pre">telnet:</span></tt>)</p>
+
+<p>1) connect to the specified host
+2) try to login with the given user and password. The default
+ user is <code>anonymous</code>, the default password is <code>anonymous@</code>.
+3) try to change to the given directory
+4) list the file with the NLST command</p></li>
+<li><p>Telnet links (<code>telnet:</code>)</p>
+
 <p>We try to connect and if user/password are given, login to the
-given telnet server.</p>
-</li>
-<li><p class="first">NNTP links (<tt class="docutils literal"><span class="pre">news:</span></tt>, <tt class="docutils literal"><span class="pre">snews:</span></tt>, <tt class="docutils literal"><span class="pre">nntp</span></tt>)</p>
+given telnet server.</p></li>
+<li><p>NNTP links (<code>news:</code>, <code>snews:</code>, <code>nntp</code>)</p>
+
 <p>We try to connect to the given NNTP server. If a news group or
-article is specified, try to request it from the server.</p>
-</li>
-<li><p class="first">Ignored links (<tt class="docutils literal"><span class="pre">javascript:</span></tt>, etc.)</p>
+article is specified, try to request it from the server.</p></li>
+<li><p>Ignored links (<code>javascript:</code>, etc.)</p>
+
 <p>An ignored link will only print a warning. No further checking
 will be made.</p>
+
 <p>Here is a complete list of recognized, but ignored links. The most
 prominent of them should be JavaScript links.</p>
-<ul class="simple">
-<li><tt class="docutils literal"><span class="pre">acap:</span></tt>      (application configuration access protocol)</li>
-<li><tt class="docutils literal"><span class="pre">afs:</span></tt>       (Andrew File System global file names)</li>
-<li><tt class="docutils literal"><span class="pre">chrome:</span></tt>    (Mozilla specific)</li>
-<li><tt class="docutils literal"><span class="pre">cid:</span></tt>       (content identifier)</li>
-<li><tt class="docutils literal"><span class="pre">clsid:</span></tt>     (Microsoft specific)</li>
-<li><tt class="docutils literal"><span class="pre">data:</span></tt>      (data)</li>
-<li><tt class="docutils literal"><span class="pre">dav:</span></tt>       (dav)</li>
-<li><tt class="docutils literal"><span class="pre">fax:</span></tt>       (fax)</li>
-<li><tt class="docutils literal"><span class="pre">find:</span></tt>      (Mozilla specific)</li>
-<li><tt class="docutils literal"><span class="pre">gopher:</span></tt>    (Gopher)</li>
-<li><tt class="docutils literal"><span class="pre">imap:</span></tt>      (internet message access protocol)</li>
-<li><tt class="docutils literal"><span class="pre">isbn:</span></tt>      (ISBN (int. book numbers))</li>
-<li><tt class="docutils literal"><span class="pre">javascript:</span></tt> (JavaScript)</li>
-<li><tt class="docutils literal"><span class="pre">ldap:</span></tt>      (Lightweight Directory Access Protocol)</li>
-<li><tt class="docutils literal"><span class="pre">mailserver:</span></tt> (Access to data available from mail servers)</li>
-<li><tt class="docutils literal"><span class="pre">mid:</span></tt>       (message identifier)</li>
-<li><tt class="docutils literal"><span class="pre">mms:</span></tt>       (multimedia stream)</li>
-<li><tt class="docutils literal"><span class="pre">modem:</span></tt>     (modem)</li>
-<li><tt class="docutils literal"><span class="pre">nfs:</span></tt>       (network file system protocol)</li>
-<li><tt class="docutils literal"><span class="pre">opaquelocktoken:</span></tt> (opaquelocktoken)</li>
-<li><tt class="docutils literal"><span class="pre">pop:</span></tt>       (Post Office Protocol v3)</li>
-<li><tt class="docutils literal"><span class="pre">prospero:</span></tt>  (Prospero Directory Service)</li>
-<li><tt class="docutils literal"><span class="pre">rsync:</span></tt>     (rsync protocol)</li>
-<li><tt class="docutils literal"><span class="pre">rtsp:</span></tt>      (real time streaming protocol)</li>
-<li><tt class="docutils literal"><span class="pre">service:</span></tt>   (service location)</li>
-<li><tt class="docutils literal"><span class="pre">shttp:</span></tt>     (secure HTTP)</li>
-<li><tt class="docutils literal"><span class="pre">sip:</span></tt>       (session initiation protocol)</li>
-<li><tt class="docutils literal"><span class="pre">tel:</span></tt>       (telephone)</li>
-<li><tt class="docutils literal"><span class="pre">tip:</span></tt>       (Transaction Internet Protocol)</li>
-<li><tt class="docutils literal"><span class="pre">tn3270:</span></tt>    (Interactive 3270 emulation sessions)</li>
-<li><tt class="docutils literal"><span class="pre">vemmi:</span></tt>     (versatile multimedia interface)</li>
-<li><tt class="docutils literal"><span class="pre">wais:</span></tt>      (Wide Area Information Servers)</li>
-<li><tt class="docutils literal"><span class="pre">z39.50r:</span></tt>   (Z39.50 Retrieval)</li>
-<li><tt class="docutils literal"><span class="pre">z39.50s:</span></tt>   (Z39.50 Session)</li>
+
+<ul>
+<li><code>acap:</code>      (application configuration access protocol)</li>
+<li><code>afs:</code>       (Andrew File System global file names)</li>
+<li><code>chrome:</code>    (Mozilla specific)</li>
+<li><code>cid:</code>       (content identifier)</li>
+<li><code>clsid:</code>     (Microsoft specific)</li>
+<li><code>data:</code>      (data)</li>
+<li><code>dav:</code>       (dav)</li>
+<li><code>fax:</code>       (fax)</li>
+<li><code>find:</code>      (Mozilla specific)</li>
+<li><code>gopher:</code>    (Gopher)</li>
+<li><code>imap:</code>      (internet message access protocol)</li>
+<li><code>isbn:</code>      (ISBN (int. book numbers))</li>
+<li><code>javascript:</code> (JavaScript)</li>
+<li><code>ldap:</code>      (Lightweight Directory Access Protocol)</li>
+<li><code>mailserver:</code> (Access to data available from mail servers)</li>
+<li><code>mid:</code>       (message identifier)</li>
+<li><code>mms:</code>       (multimedia stream)</li>
+<li><code>modem:</code>     (modem)</li>
+<li><code>nfs:</code>       (network file system protocol)</li>
+<li><code>opaquelocktoken:</code> (opaquelocktoken)</li>
+<li><code>pop:</code>       (Post Office Protocol v3)</li>
+<li><code>prospero:</code>  (Prospero Directory Service)</li>
+<li><code>rsync:</code>     (rsync protocol)</li>
+<li><code>rtsp:</code>      (real time streaming protocol)</li>
+<li><code>service:</code>   (service location)</li>
+<li><code>shttp:</code>     (secure HTTP)</li>
+<li><code>sip:</code>       (session initiation protocol)</li>
+<li><code>tel:</code>       (telephone)</li>
+<li><code>tip:</code>       (Transaction Internet Protocol)</li>
+<li><code>tn3270:</code>    (Interactive 3270 emulation sessions)</li>
+<li><code>vemmi:</code>     (versatile multimedia interface)</li>
+<li><code>wais:</code>      (Wide Area Information Servers)</li>
+<li><code>z39.50r:</code>   (Z39.50 Retrieval)</li>
+<li><code>z39.50s:</code>   (Z39.50 Session)</li>
+</ul></li>
 </ul>
-</li>
-</ul>
-</div>
-<div class="section" id="recursion">
+
 <h2>Recursion</h2>
+
 <p>Before descending recursively into a URL, it has to fulfill several
 conditions. They are checked in this order:</p>
-<ol class="arabic simple">
-<li>A URL must be valid.</li>
-<li>A URL must be parseable. This currently includes HTML files,
+
+<ol>
+<li><p>A URL must be valid.</p></li>
+<li><p>A URL must be parseable. This currently includes HTML files,
 Opera bookmarks files, and directories. If a file type cannot
 be determined (for example it does not have a common HTML file
 extension, and the content does not look like HTML), it is assumed
-to be non-parseable.</li>
-<li>The URL content must be retrievable. This is usually the case
-except for example mailto: or unknown URL types.</li>
-<li>The maximum recursion level must not be exceeded. It is configured
-with the <tt class="docutils literal"><span class="pre">--recursion-level</span></tt> option and is unlimited per default.</li>
-<li>It must not match the ignored URL list. This is controlled with
-the <tt class="docutils literal"><span class="pre">--ignore-url</span></tt> option.</li>
-<li>The Robots Exclusion Protocol must allow links in the URL to be
+to be non-parseable.</p></li>
+<li><p>The URL content must be retrievable. This is usually the case
+except for example mailto: or unknown URL types.</p></li>
+<li><p>The maximum recursion level must not be exceeded. It is configured
+with the <code>--recursion-level</code> option and is unlimited per default.</p></li>
+<li><p>It must not match the ignored URL list. This is controlled with
+the <code>--ignore-url</code> option.</p></li>
+<li><p>The Robots Exclusion Protocol must allow links in the URL to be
 followed recursively. This is checked by searching for a
-&#8220;nofollow&#8221; directive in the HTML header data.</li>
+"nofollow" directive in the HTML header data.</p></li>
 </ol>
+
 <p>Note that the directory recursion reads all files in that
-directory, not just a subset like <tt class="docutils literal"><span class="pre">index.htm*</span></tt>.</p>
-</div>
-<div class="section" id="frequently-asked-questions">
-<h2>Frequently asked questions</h2>
-<p><strong>Q: LinkChecker produced an error, but my web page is ok with
-Mozilla/IE/Opera/...
-Is this a bug in LinkChecker?</strong></p>
-<p>A: Please check your web pages first. Are they really ok?
-Use the <tt class="docutils literal"><span class="pre">--check-html</span></tt> option, or check if you are using a proxy
-which produces the error.</p>
-<p><strong>Q: I still get an error, but the page is definitely ok.</strong></p>
-<p>A: Some servers deny access of automated tools (also called robots)
-like LinkChecker. This is not a bug in LinkChecker but rather a
-policy by the webmaster running the website you are checking. Look
-the <tt class="docutils literal"><span class="pre">/robots.txt</span></tt> file which follows the <a class="reference external" href="http://www.robotstxt.org/wc/norobots-rfc.html">robots.txt exclusion standard</a>.</p>
-<p><strong>Q: How can I tell LinkChecker which proxy to use?</strong></p>
-<p>A: LinkChecker works transparently with proxies. In a Unix or Windows
-environment, set the http_proxy, https_proxy, ftp_proxy environment
-variables to a URL that identifies the proxy server before starting
-LinkChecker. For example</p>
-<div class="highlight-python"><pre>$ http_proxy="http://www.someproxy.com:3128"
-$ export http_proxy</pre>
-</div>
-<p><strong>Q: The link &#8220;mailto:john&#64;company.com?subject=Hello John&#8221; is reported
-as an error.</strong></p>
-<p>A: You have to quote special characters (e.g. spaces) in the subject field.
-The correct link should be &#8220;mailto:...?subject=Hello%20John&#8221;
-Unfortunately browsers like IE and Netscape do not enforce this.</p>
-<p><strong>Q: Has LinkChecker JavaScript support?</strong></p>
-<p>A: No, it never will. If your page is not working without JS, it is
-better checked with a browser testing tool like <a class="reference external" href="http://seleniumhq.org/">Selenium</a>.</p>
-<p><strong>Q: Is LinkCheckers cookie feature insecure?</strong></p>
-<p>A: If a cookie file is specified, the information will be sent
-to the specified hosts.
-The following restrictions apply for LinkChecker cookies:</p>
-<ul class="simple">
-<li>Cookies will only be sent to the originating server.</li>
-<li>Cookies are only stored in memory. After LinkChecker finishes, they
-are lost.</li>
-<li>The cookie feature is disabled as default.</li>
-</ul>
-<p><strong>Q: I see LinkChecker gets a /robots.txt file for every site it
-checks. What is that about?</strong></p>
-<p>A: LinkChecker follows the <a class="reference external" href="http://www.robotstxt.org/wc/norobots-rfc.html">robots.txt exclusion standard</a>. To avoid
-misuse of LinkChecker, you cannot turn this feature off.
-See the <a class="reference external" href="http://www.robotstxt.org/wc/robots.html">Web Robot pages</a> and the <a class="reference external" href="http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt">Spidering report</a> for more info.</p>
-<p><strong>Q: How do I print unreachable/dead documents of my website with
-LinkChecker?</strong></p>
-<p>A: No can do. This would require file system access to your web
-repository and access to your web server configuration.</p>
-<p><strong>Q: How do I check HTML/XML/CSS syntax with LinkChecker?</strong></p>
-<p>A: Use the <tt class="docutils literal"><span class="pre">--check-html</span></tt> and <tt class="docutils literal"><span class="pre">--check-css</span></tt> options.</p>
-</div>
-</div>
-
-
-          </div>
-      </div>
-      <div class="clearer"></div>
-    </div>
+directory, not just a subset like <code>index.htm*</code>.</p>
    <div class="footer">
      &copy; Copyright 2009, Bastian Kleineidam.
    </div>
--- a/doc/html/index.txt
+++ b/doc/html/index.txt
@ -0,0 +1,146 @@
+Documentation
+=============
+
+Basic usage
+-----------
+
+To check a URL like ``http://www.myhomepage.org/`` it is enough to
+execute ``linkchecker http://www.myhomepage.org/``. This will check the
+complete domain of www.myhomepage.org recursively. All links pointing
+outside of the domain are also checked for validity.
+
+Performed checks
+----------------
+
+All URLs have to pass a preliminary syntax test. Minor quoting
+mistakes will issue a warning, all other invalid syntax issues
+are errors.
+After the syntax check passes, the URL is queued for connection
+checking. All connection check types are described below.
+
+- HTTP links (``http:``, ``https:``)
+  
+  After connecting to the given HTTP server the given path
+  or query is requested. All redirections are followed, and
+  if user/password is given it will be used as authorization
+  when necessary.
+  Permanently moved pages issue a warning.
+  All final HTTP status codes other than 2xx are errors.
+
+- Local files (``file:``)
+  
+  A regular, readable file that can be opened is valid. A readable
+  directory is also valid. All other files, for example device files,
+  unreadable or non-existing files are errors.
+  
+  File contents are checked for recursion.
+  
+- Mail links (``mailto:``)
+  
+  A mailto: link eventually resolves to a list of email addresses.
+  If one address fails, the whole list will fail.
+  For each mail address we check the following things:
+  
+  1) Check the adress syntax, both of the part before and after
+     the @ sign.
+  2) Look up the MX DNS records. If we found no MX record,
+     print an error.
+  3) Check if one of the mail hosts accept an SMTP connection.
+     Check hosts with higher priority first.
+     If no host accepts SMTP, we print a warning.
+  4) Try to verify the address with the VRFY command. If we got
+     an answer, print the verified address as an info.
+
+- FTP links (``ftp:``)
+  
+  For FTP links we do:
+  
+  1) connect to the specified host
+  2) try to login with the given user and password. The default
+     user is ``anonymous``, the default password is ``anonymous@``.
+  3) try to change to the given directory
+  4) list the file with the NLST command
+
+- Telnet links (``telnet:``)
+  
+  We try to connect and if user/password are given, login to the
+  given telnet server.
+
+- NNTP links (``news:``, ``snews:``, ``nntp``)
+  
+  We try to connect to the given NNTP server. If a news group or
+  article is specified, try to request it from the server.
+
+- Ignored links (``javascript:``, etc.)
+  
+  An ignored link will only print a warning. No further checking
+  will be made.
+  
+  Here is a complete list of recognized, but ignored links. The most
+  prominent of them should be JavaScript links.
+  
+  - ``acap:``      (application configuration access protocol)
+  - ``afs:``       (Andrew File System global file names)
+  - ``chrome:``    (Mozilla specific)
+  - ``cid:``       (content identifier)
+  - ``clsid:``     (Microsoft specific)
+  - ``data:``      (data)
+  - ``dav:``       (dav)
+  - ``fax:``       (fax)
+  - ``find:``      (Mozilla specific)
+  - ``gopher:``    (Gopher)
+  - ``imap:``      (internet message access protocol)
+  - ``isbn:``      (ISBN (int. book numbers))
+  - ``javascript:`` (JavaScript)
+  - ``ldap:``      (Lightweight Directory Access Protocol)
+  - ``mailserver:`` (Access to data available from mail servers)
+  - ``mid:``       (message identifier)
+  - ``mms:``       (multimedia stream)
+  - ``modem:``     (modem)
+  - ``nfs:``       (network file system protocol)
+  - ``opaquelocktoken:`` (opaquelocktoken)
+  - ``pop:``       (Post Office Protocol v3)
+  - ``prospero:``  (Prospero Directory Service)
+  - ``rsync:``     (rsync protocol)
+  - ``rtsp:``      (real time streaming protocol)
+  - ``service:``   (service location)
+  - ``shttp:``     (secure HTTP)
+  - ``sip:``       (session initiation protocol)
+  - ``tel:``       (telephone)
+  - ``tip:``       (Transaction Internet Protocol)
+  - ``tn3270:``    (Interactive 3270 emulation sessions)
+  - ``vemmi:``     (versatile multimedia interface)
+  - ``wais:``      (Wide Area Information Servers)
+  - ``z39.50r:``   (Z39.50 Retrieval)
+  - ``z39.50s:``   (Z39.50 Session)
+
+
+Recursion
+---------
+
+Before descending recursively into a URL, it has to fulfill several
+conditions. They are checked in this order:
+
+1. A URL must be valid.
+
+2. A URL must be parseable. This currently includes HTML files,
+   Opera bookmarks files, and directories. If a file type cannot
+   be determined (for example it does not have a common HTML file
+   extension, and the content does not look like HTML), it is assumed
+   to be non-parseable.
+
+3. The URL content must be retrievable. This is usually the case
+   except for example mailto: or unknown URL types.
+
+4. The maximum recursion level must not be exceeded. It is configured
+   with the ``--recursion-level`` option and is unlimited per default.
+
+5. It must not match the ignored URL list. This is controlled with
+   the ``--ignore-url`` option.
+
+6. The Robots Exclusion Protocol must allow links in the URL to be
+   followed recursively. This is checked by searching for a
+   "nofollow" directive in the HTML header data.
+
+Note that the directory recursion reads all files in that
+directory, not just a subset like ``index.htm*``.
--- a/doc/html/logo64x64.png
+++ b/doc/html/logo64x64.png