Use direct HTML documentation for the GUI client; moved the homepage content to a separate package.

2026-04-26 00:54:43 +00:00 · 2009-07-20 18:32:54 +02:00 · 2009-07-20 18:32:54 +02:00 · fd29a15af7
commit fd29a15af7
parent 9faa7d33d2
19 changed files with 227 additions and 923 deletions
--- a/doc/source/static/favicon.ico
+++ b/doc/source/static/favicon.ico
--- a/doc/source/static/favicon16x16.png
+++ b/doc/source/static/favicon16x16.png
--- a/doc/source/static/favicon32x32.png
+++ b/doc/source/static/favicon32x32.png
--- a/doc/index.html
+++ b/doc/index.html
@ -0,0 +1,227 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    
+    <title>Check websites for broken links</title>
+    <link rel="stylesheet" href="_static/sphinxdoc.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <link rel="top" title="LinkChecker" href="" />
+<style type="text/css">
+img { border: 0; }
+</style>
+
+  </head>
+  <body>
+<div style="background-color: white; text-align: left; padding: 10px 10px 15px 15px">
+
+<table border="0"><tr>
+ <td><a href=""><img
+  src="_static/logo64x64.png" border="0" alt="LinkChecker"/></a></td>
+ <td><h1>LinkChecker</h1></td>
+</tr></table>
+</div>
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body">
+            
+  <div class="section" id="check-websites-for-broken-links">
+<h1>Check websites for broken links</h1>
+<p>LinkChecker is a free, <a class="reference external" href="http://www.gnu.org/licenses/gpl-2.0.html">GPL</a> licensed URL validator.</p>
+<div class="section" id="basic-usage">
+<h2>Basic usage</h2>
+<p>To check a URL like <tt class="docutils literal"><span class="pre">http://www.myhomepage.org/</span></tt> it is enough to
+execute <tt class="docutils literal"><span class="pre">linkchecker</span> <span class="pre">http://www.myhomepage.org/</span></tt>. This will check the
+complete domain of www.myhomepage.org recursively. All links pointing
+outside of the domain are also checked for validity.</p>
+</div>
+<div class="section" id="performed-checks">
+<h2>Performed checks</h2>
+<p>All URLs have to pass a preliminary syntax test. Minor quoting
+mistakes will issue a warning, all other invalid syntax issues
+are errors.
+After the syntax check passes, the URL is queued for connection
+checking. All connection check types are described below.</p>
+<ul>
+<li><p class="first">HTTP links (<tt class="docutils literal"><span class="pre">http:</span></tt>, <tt class="docutils literal"><span class="pre">https:</span></tt>)</p>
+<p>After connecting to the given HTTP server the given path
+or query is requested. All redirections are followed, and
+if user/password is given it will be used as authorization
+when necessary.
+Permanently moved pages issue a warning.
+All final HTTP status codes other than 2xx are errors.</p>
+</li>
+<li><p class="first">Local files (<tt class="docutils literal"><span class="pre">file:</span></tt>)</p>
+<p>A regular, readable file that can be opened is valid. A readable
+directory is also valid. All other files, for example device files,
+unreadable or non-existing files are errors.</p>
+<p>File contents are checked for recursion.</p>
+</li>
+<li><p class="first">Mail links (<tt class="docutils literal"><span class="pre">mailto:</span></tt>)</p>
+<p>A mailto: link eventually resolves to a list of email addresses.
+If one address fails, the whole list will fail.
+For each mail address we check the following things:</p>
+<ol class="arabic simple">
+<li>Check the adress syntax, both of the part before and after
+the &#64; sign.</li>
+<li>Look up the MX DNS records. If we found no MX record,
+print an error.</li>
+<li>Check if one of the mail hosts accept an SMTP connection.
+Check hosts with higher priority first.
+If no host accepts SMTP, we print a warning.</li>
+<li>Try to verify the address with the VRFY command. If we got
+an answer, print the verified address as an info.</li>
+</ol>
+</li>
+<li><p class="first">FTP links (<tt class="docutils literal"><span class="pre">ftp:</span></tt>)</p>
+<p>For FTP links we do:</p>
+<ol class="arabic simple">
+<li>connect to the specified host</li>
+<li>try to login with the given user and password. The default
+user is <tt class="docutils literal"><span class="pre">anonymous</span></tt>, the default password is <tt class="docutils literal"><span class="pre">anonymous&#64;</span></tt>.</li>
+<li>try to change to the given directory</li>
+<li>list the file with the NLST command</li>
+</ol>
+</li>
+<li><p class="first">Telnet links (<tt class="docutils literal"><span class="pre">telnet:</span></tt>)</p>
+<p>We try to connect and if user/password are given, login to the
+given telnet server.</p>
+</li>
+<li><p class="first">NNTP links (<tt class="docutils literal"><span class="pre">news:</span></tt>, <tt class="docutils literal"><span class="pre">snews:</span></tt>, <tt class="docutils literal"><span class="pre">nntp</span></tt>)</p>
+<p>We try to connect to the given NNTP server. If a news group or
+article is specified, try to request it from the server.</p>
+</li>
+<li><p class="first">Ignored links (<tt class="docutils literal"><span class="pre">javascript:</span></tt>, etc.)</p>
+<p>An ignored link will only print a warning. No further checking
+will be made.</p>
+<p>Here is a complete list of recognized, but ignored links. The most
+prominent of them should be JavaScript links.</p>
+<ul class="simple">
+<li><tt class="docutils literal"><span class="pre">acap:</span></tt>      (application configuration access protocol)</li>
+<li><tt class="docutils literal"><span class="pre">afs:</span></tt>       (Andrew File System global file names)</li>
+<li><tt class="docutils literal"><span class="pre">chrome:</span></tt>    (Mozilla specific)</li>
+<li><tt class="docutils literal"><span class="pre">cid:</span></tt>       (content identifier)</li>
+<li><tt class="docutils literal"><span class="pre">clsid:</span></tt>     (Microsoft specific)</li>
+<li><tt class="docutils literal"><span class="pre">data:</span></tt>      (data)</li>
+<li><tt class="docutils literal"><span class="pre">dav:</span></tt>       (dav)</li>
+<li><tt class="docutils literal"><span class="pre">fax:</span></tt>       (fax)</li>
+<li><tt class="docutils literal"><span class="pre">find:</span></tt>      (Mozilla specific)</li>
+<li><tt class="docutils literal"><span class="pre">gopher:</span></tt>    (Gopher)</li>
+<li><tt class="docutils literal"><span class="pre">imap:</span></tt>      (internet message access protocol)</li>
+<li><tt class="docutils literal"><span class="pre">isbn:</span></tt>      (ISBN (int. book numbers))</li>
+<li><tt class="docutils literal"><span class="pre">javascript:</span></tt> (JavaScript)</li>
+<li><tt class="docutils literal"><span class="pre">ldap:</span></tt>      (Lightweight Directory Access Protocol)</li>
+<li><tt class="docutils literal"><span class="pre">mailserver:</span></tt> (Access to data available from mail servers)</li>
+<li><tt class="docutils literal"><span class="pre">mid:</span></tt>       (message identifier)</li>
+<li><tt class="docutils literal"><span class="pre">mms:</span></tt>       (multimedia stream)</li>
+<li><tt class="docutils literal"><span class="pre">modem:</span></tt>     (modem)</li>
+<li><tt class="docutils literal"><span class="pre">nfs:</span></tt>       (network file system protocol)</li>
+<li><tt class="docutils literal"><span class="pre">opaquelocktoken:</span></tt> (opaquelocktoken)</li>
+<li><tt class="docutils literal"><span class="pre">pop:</span></tt>       (Post Office Protocol v3)</li>
+<li><tt class="docutils literal"><span class="pre">prospero:</span></tt>  (Prospero Directory Service)</li>
+<li><tt class="docutils literal"><span class="pre">rsync:</span></tt>     (rsync protocol)</li>
+<li><tt class="docutils literal"><span class="pre">rtsp:</span></tt>      (real time streaming protocol)</li>
+<li><tt class="docutils literal"><span class="pre">service:</span></tt>   (service location)</li>
+<li><tt class="docutils literal"><span class="pre">shttp:</span></tt>     (secure HTTP)</li>
+<li><tt class="docutils literal"><span class="pre">sip:</span></tt>       (session initiation protocol)</li>
+<li><tt class="docutils literal"><span class="pre">tel:</span></tt>       (telephone)</li>
+<li><tt class="docutils literal"><span class="pre">tip:</span></tt>       (Transaction Internet Protocol)</li>
+<li><tt class="docutils literal"><span class="pre">tn3270:</span></tt>    (Interactive 3270 emulation sessions)</li>
+<li><tt class="docutils literal"><span class="pre">vemmi:</span></tt>     (versatile multimedia interface)</li>
+<li><tt class="docutils literal"><span class="pre">wais:</span></tt>      (Wide Area Information Servers)</li>
+<li><tt class="docutils literal"><span class="pre">z39.50r:</span></tt>   (Z39.50 Retrieval)</li>
+<li><tt class="docutils literal"><span class="pre">z39.50s:</span></tt>   (Z39.50 Session)</li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="section" id="recursion">
+<h2>Recursion</h2>
+<p>Before descending recursively into a URL, it has to fulfill several
+conditions. They are checked in this order:</p>
+<ol class="arabic simple">
+<li>A URL must be valid.</li>
+<li>A URL must be parseable. This currently includes HTML files,
+Opera bookmarks files, and directories. If a file type cannot
+be determined (for example it does not have a common HTML file
+extension, and the content does not look like HTML), it is assumed
+to be non-parseable.</li>
+<li>The URL content must be retrievable. This is usually the case
+except for example mailto: or unknown URL types.</li>
+<li>The maximum recursion level must not be exceeded. It is configured
+with the <tt class="docutils literal"><span class="pre">--recursion-level</span></tt> option and is unlimited per default.</li>
+<li>It must not match the ignored URL list. This is controlled with
+the <tt class="docutils literal"><span class="pre">--ignore-url</span></tt> option.</li>
+<li>The Robots Exclusion Protocol must allow links in the URL to be
+followed recursively. This is checked by searching for a
+&#8220;nofollow&#8221; directive in the HTML header data.</li>
+</ol>
+<p>Note that the directory recursion reads all files in that
+directory, not just a subset like <tt class="docutils literal"><span class="pre">index.htm*</span></tt>.</p>
+</div>
+<div class="section" id="frequently-asked-questions">
+<h2>Frequently asked questions</h2>
+<p><strong>Q: LinkChecker produced an error, but my web page is ok with
+Mozilla/IE/Opera/...
+Is this a bug in LinkChecker?</strong></p>
+<p>A: Please check your web pages first. Are they really ok?
+Use the <tt class="docutils literal"><span class="pre">--check-html</span></tt> option, or check if you are using a proxy
+which produces the error.</p>
+<p><strong>Q: I still get an error, but the page is definitely ok.</strong></p>
+<p>A: Some servers deny access of automated tools (also called robots)
+like LinkChecker. This is not a bug in LinkChecker but rather a
+policy by the webmaster running the website you are checking. Look
+the <tt class="docutils literal"><span class="pre">/robots.txt</span></tt> file which follows the <a class="reference external" href="http://www.robotstxt.org/wc/norobots-rfc.html">robots.txt exclusion standard</a>.</p>
+<p><strong>Q: How can I tell LinkChecker which proxy to use?</strong></p>
+<p>A: LinkChecker works transparently with proxies. In a Unix or Windows
+environment, set the http_proxy, https_proxy, ftp_proxy environment
+variables to a URL that identifies the proxy server before starting
+LinkChecker. For example</p>
+<div class="highlight-python"><pre>$ http_proxy="http://www.someproxy.com:3128"
+$ export http_proxy</pre>
+</div>
+<p><strong>Q: The link &#8220;mailto:john&#64;company.com?subject=Hello John&#8221; is reported
+as an error.</strong></p>
+<p>A: You have to quote special characters (e.g. spaces) in the subject field.
+The correct link should be &#8220;mailto:...?subject=Hello%20John&#8221;
+Unfortunately browsers like IE and Netscape do not enforce this.</p>
+<p><strong>Q: Has LinkChecker JavaScript support?</strong></p>
+<p>A: No, it never will. If your page is not working without JS, it is
+better checked with a browser testing tool like <a class="reference external" href="http://seleniumhq.org/">Selenium</a>.</p>
+<p><strong>Q: Is LinkCheckers cookie feature insecure?</strong></p>
+<p>A: If a cookie file is specified, the information will be sent
+to the specified hosts.
+The following restrictions apply for LinkChecker cookies:</p>
+<ul class="simple">
+<li>Cookies will only be sent to the originating server.</li>
+<li>Cookies are only stored in memory. After LinkChecker finishes, they
+are lost.</li>
+<li>The cookie feature is disabled as default.</li>
+</ul>
+<p><strong>Q: I see LinkChecker gets a /robots.txt file for every site it
+checks. What is that about?</strong></p>
+<p>A: LinkChecker follows the <a class="reference external" href="http://www.robotstxt.org/wc/norobots-rfc.html">robots.txt exclusion standard</a>. To avoid
+misuse of LinkChecker, you cannot turn this feature off.
+See the <a class="reference external" href="http://www.robotstxt.org/wc/robots.html">Web Robot pages</a> and the <a class="reference external" href="http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt">Spidering report</a> for more info.</p>
+<p><strong>Q: How do I print unreachable/dead documents of my website with
+LinkChecker?</strong></p>
+<p>A: No can do. This would require file system access to your web
+repository and access to your web server configuration.</p>
+<p><strong>Q: How do I check HTML/XML/CSS syntax with LinkChecker?</strong></p>
+<p>A: Use the <tt class="docutils literal"><span class="pre">--check-html</span></tt> and <tt class="docutils literal"><span class="pre">--check-css</span></tt> options.</p>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="footer">
+      &copy; Copyright 2009, Bastian Kleineidam.
+    </div>
+  </body>
+</html>
--- a/doc/source/static/logo128x128.png
+++ b/doc/source/static/logo128x128.png
--- a/doc/source/static/logo64x64.png
+++ b/doc/source/static/logo64x64.png
--- a/doc/mediacompress.py
+++ b/doc/mediacompress.py
@ -1,291 +0,0 @@
-#!/usr/bin/python
-# -*- coding: utf-8 -*-
-# Copyright (C) 2007-2009 Bastian Kleineidam
-#
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
-# (at your option) any later version.
-#
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, write to the Free Software
-# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-"""
-A script to lossless compress media files to be used in production
-deployments of web software. Used together with HTML compression
-it decreases the size of transmitted data considerably.
-
-Currently supported media files:
-  Type       Extension  Compressor(s)
-========================================================
-JavaScript  .js         YUI compressor (a Java program)
-CSS         .css        YUI compressor (a Java program)
-PNG         .png        pngcrush (a C program)
-JPEG        .jpg        jpegtran (a C program)
-GIF         .gif        giftrans (a C program)
-
-It compresses all supported media files to new files. The original
-files will not be changed if not explicitely requested.
-
-Compressed files will be named <filebase>-min.<ext> where <filebase> is
-everything up to the last dot and <ext> is everything after the last dot.
-If requested, the original file will be overwritten with the compressed one.
-
-A directory will be recursively searched and all media files within will
-be compressed.
-
-Files will only be compressed when the compressed file is missing or the
-original file is newer than the compressed file.
-"""
-import sys
-import os
-import getopt
-import stat
-import shutil
-from distutils.spawn import spawn, find_executable
-from distutils.errors import DistutilsExecError
-import distutils.log
-distutils.log.set_verbosity(1)
-
-
-# list of extensions for compressable files
-COMPRESS_EXTENSIONS = (".js", ".css", ".png", ".jpg", ".gif")
-
-
-def log (*args):
-    for arg in args:
-        print >> sys.stderr, arg,
-    print >> sys.stderr
-
-
-def usage (msg=None):
-    """
-    Print usage information to sys.stderr and call sys.exit().
-    The exit code is zero if msg is None, else one.
-    """
-    if msg is None:
-        err = 0
-    else:
-        print >> sys.stderr, msg
-        err = 1
-    thisfile = os.path.basename(__file__)
-    log("Usage:", thisfile, "[options]", "<file-or-directory>...")
-    log("Options:")
-    log("  --js-compressor - Specify the JavaScript compressor " \
-        "(default: yuicompressor.jar)")
-    log("  --exclude - Specify (part of) filenames to ignore")
-    log("  --overwrite - Comma-separated list of file extensions to overwrite")
-    log("  --help - Display help")
-    sys.exit(err)
-
-
-class DirectoryWalker:
-
-    def __init__(self, directory):
-        self.stack = [directory]
-        self.files = []
-        self.index = 0
-
-    def __getitem__(self, index):
-        while 1:
-            try:
-                file = self.files[self.index]
-                self.index = self.index + 1
-            except IndexError:
-                # pop next directory from stack
-                self.directory = self.stack.pop()
-                self.files = os.listdir(self.directory)
-                self.index = 0
-            else:
-                # got a filename
-                fullname = os.path.join(self.directory, file)
-                if os.path.isdir(fullname) and not os.path.islink(fullname):
-                    self.stack.append(fullname)
-                return fullname
-
-
-def is_compressable (settings, filename):
-    "Check if given filename is compressable."
-    # is it excluded?
-    if [x for x in settings["exclude"] if x in filename]:
-        return False
-    # is it compressable?
-    return os.path.splitext(filename)[1] in COMPRESS_EXTENSIONS
-
-
-def get_files (settings, args):
-    """
-    Given a list of files and/or directories return all compressable
-    files as an iterator.
-    """
-    for arg in args:
-        if os.path.isdir(arg):
-            for file in DirectoryWalker(arg):
-                if is_compressable(settings, file):
-                    yield file
-        elif os.path.isfile(arg):
-            if is_compressable(settings, arg):
-                yield arg
-        else:
-            log("Warning: not a file or directory", repr(arg))
-
-
-settings = {
-    # default compress executables
-    "compressor": {
-        ".js": "yuicompressor.jar", # Note: automatically suffixes with "java"
-        ".css": "yuicompressor.jar",
-        ".png": "pngcrush",
-        ".jpg": "jpegtran",
-        ".gif": "giftrans",
-    },
-    # list of filenames (or a part of them) to exclude
-    "exclude": set(),
-    # list of file extensions to overwrite
-    "overwrite": set(),
-}
-def parse_options (args):
-    """
-    Parse command line arguments.
-    @return: (settings, args)
-    @rtype: tuple (dict, list)
-    """
-    long_opts = ["help", "js-compressor=", "exclude=", "overwrite="]
-    try:
-        opts, args = getopt.getopt(args, "", long_opts)
-    except getopt.error:
-        usage(msg=sys.exc_info()[1])
-    for opt, arg in opts:
-        if opt == "--help":
-            usage()
-        elif opt == "--js-compressor":
-            for ext in (".js", ".css"):
-                settings["compressor"][ext] = arg
-        elif opt == "--exclude":
-            settings["exclude"].add(arg)
-        elif opt == "--overwrite":
-            exts = [x.strip().lower() for x in arg.split(",") if x]
-            settings["overwrite"].update(exts)
-        else:
-            usage(msg="Unbekannte Option %r" % opt)
-    return settings, args
-
-
-def get_mtime (filename):
-    "Return modification time of file."
-    return os.stat(filename)[stat.ST_MTIME]
-
-
-def get_fsize (filename):
-    "Return file size in bytes."
-    return os.stat(filename)[stat.ST_SIZE]
-
-
-def needs_compression (infile, outfile):
-    "Check if infile needs to be compressed to given outfile."
-    if not os.path.exists(outfile):
-        return True
-    return get_mtime(infile) > get_mtime(outfile)
-
-
-def compress_file (infile):
-    "Compress given file if needed."
-    base, ext = os.path.splitext(infile)
-    if base.endswith("-min"):
-        #log("Ignoring", repr(infile))
-        return
-    outfile = "%s-min%s" % (base, ext)
-    if needs_compression(infile, outfile):
-        cmd = compress_cmd(ext, infile, outfile)
-        if not cmd:
-            log("Skipping", repr(infile), "no compressor available")
-            return
-        try:
-            log("Compressing", repr(infile), "...")
-            run_cmd(cmd)
-        except DistutilsExecError, msg:
-            log("Error running %s: %s" % (cmd, msg))
-        else:
-            insize = get_fsize(infile)
-            outsize = get_fsize(outfile)
-            if outsize > insize:
-                log("Warning: compressed file is bigger than original "
-                    "(%dB > %dB); copying instead." % (insize, outsize))
-                shutil.copyfile(infile, outfile)
-            else:
-                percentage = float(outsize * 100) / insize
-                log(".. compressed to %.2f%% (%dB -> %dB)" % \
-                    (percentage, insize, outsize))
-            if ext[1:].lower() in settings["overwrite"]:
-                shutil.move(outfile, infile)
-    else:
-        log("Skipping", repr(infile))
-
-
-def compress_cmd (ext, infile, outfile):
-    "Get list of commands args for compression."
-    cmd = []
-    compressor = settings["compressor"][ext]
-    if compressor.endswith(".jar"):
-        if not find_executable("java"):
-            return None
-        cmd.insert(0, "java")
-        cmd.insert(1, "-jar")
-    elif not find_executable(compressor):
-        return None
-    cmd.append(compressor)
-    cmd.extend(compressor_args(compressor, infile, outfile))
-    return cmd
-
-
-def compressor_args (compressor, infile, outfile):
-    """
-    Return list of commandline arguments that compress infile to outfile
-    with given compressor.
-    """
-    basename = os.path.basename(compressor).lower()
-    if basename.startswith("yuicompressor"):
-        args = compressor_args_yui(infile, outfile)
-    elif basename.startswith("pngcrush"):
-        args = compressor_args_pngcursh(infile, outfile)
-    elif basename.startswith("jpegtran"):
-        args = compressor_args_jpegtran(infile, outfile)
-    elif basename.startswith("giftrans"):
-        args = compressor_args_giftrans(infile, outfile)
-    else:
-        raise getopt.error("Unknown compressor %r" % compressor)
-    return args
-
-
-def compressor_args_yui (infile, outfile):
-    return ["--charset", "utf8", "-o", outfile, infile]
-
-def compressor_args_pngcursh (infile, outfile):
-    return [infile, outfile]
-
-def compressor_args_jpegtran (infile, outfile):
-    return ["-optimize", "-perfect", "-copy", "none",
-            "-outfile", outfile, infile]
-
-def compressor_args_giftrans (infile, outfile):
-    return ["-C", "-o", outfile, infile]
-
-
-def run_cmd (cmd):
-    "Execute given command."
-    return spawn(cmd)
-
-
-def main (args):
-    settings, args = parse_options(args)
-    for file in get_files(settings, args):
-        compress_file(file)
-
-
-if __name__ == '__main__':
-    main(sys.argv[1:])
--- a/doc/source/static/shot1.png
+++ b/doc/source/static/shot1.png
--- a/doc/source/shot1_thumb.jpg
+++ b/doc/source/shot1_thumb.jpg
--- a/doc/source/static/shot2.png
+++ b/doc/source/static/shot2.png
--- a/doc/source/shot2_thumb.jpg
+++ b/doc/source/shot2_thumb.jpg
--- a/doc/source/static/shot3.png
+++ b/doc/source/static/shot3.png
--- a/doc/source/shot3_thumb.jpg
+++ b/doc/source/shot3_thumb.jpg
--- a/doc/source/conf.py
+++ b/doc/source/conf.py
@ -1,193 +0,0 @@
-# -*- coding: utf-8 -*-
-#
-# LinkChecker documentation build configuration file, created by
-# sphinx-quickstart on Tue Jan 20 23:59:41 2009.
-#
-# This file is execfile()d with the current directory set to its containing dir.
-#
-# The contents of this file are pickled, so don't put values in the namespace
-# that aren't pickleable (module imports are okay, they're removed automatically).
-#
-# Note that not all possible configuration values are present in this
-# autogenerated file.
-#
-# All configuration values have a default; values that are commented out
-# serve to show the default.
-
-#import sys, os
-
-# If your extensions are in another directory, add it here. If the directory
-# is relative to the documentation root, use os.path.abspath to make it
-# absolute, like shown here.
-#sys.path.append(os.path.abspath('.'))
-
-# General configuration
-# ---------------------
-
-# Add any Sphinx extension module names here, as strings. They can be extensions
-# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
-extensions = []
-
-# Add any paths that contain templates here, relative to this directory.
-templates_path = ['templates']
-
-# The suffix of source filenames.
-source_suffix = '.txt'
-
-# The encoding of source files.
-#source_encoding = 'utf-8'
-
-# The master toctree document.
-master_doc = 'index'
-
-# General information about the project.
-project = u'LinkChecker'
-copyright = u'2009, Bastian Kleineidam'
-
-# The version info for the project you're documenting, acts as replacement for
-# |version| and |release|, also used in various other places throughout the
-# built documents.
-#
-# The short X.Y version.
-version = '5.0.2'
-# The full version, including alpha/beta/rc tags.
-release = version
-
-# The language for content autogenerated by Sphinx. Refer to documentation
-# for a list of supported languages.
-#language = None
-
-# There are two options for replacing |today|: either, you set today to some
-# non-false value, then it is used:
-#today = ''
-# Else, today_fmt is used as the format for a strftime call.
-#today_fmt = '%B %d, %Y'
-
-# List of documents that shouldn't be included in the build.
-#unused_docs = None
-
-# List of directories, relative to source directory, that shouldn't be searched
-# for source files.
-exclude_trees = []
-
-# The reST default role (used for this markup: `text`) to use for all documents.
-#default_role = None
-
-# If true, '()' will be appended to :func: etc. cross-reference text.
-#add_function_parentheses = True
-
-# If true, the current module name will be prepended to all description
-# unit titles (such as .. function::).
-add_module_names = False
-
-# If true, sectionauthor and moduleauthor directives will be shown in the
-# output. They are ignored by default.
-#show_authors = False
-
-# The name of the Pygments (syntax highlighting) style to use.
-pygments_style = 'friendly'
-
-
-# Options for HTML output
-# -----------------------
-
-# The style sheet to use for HTML and HTML Help pages. A file of that name
-# must exist either in Sphinx' static/ path, or in one of the custom paths
-# given in html_static_path.
-html_style = 'sphinxdoc.css'
-
-# The name for this set of Sphinx documents.  If None, it defaults to
-# "<project> v<release> documentation".
-html_title = project
-
-# A shorter title for the navigation bar.  Default is the same as html_title.
-#html_short_title = None
-
-# The name of an image file (relative to this directory) to place at the top
-# of the sidebar.
-html_logo = None
-
-# The name of an image file (within the static path) to use as favicon of the
-# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
-# pixels large.
-html_favicon = "favicon.ico"
-
-# Add any paths that contain custom static files (such as style sheets) here,
-# relative to this directory. They are copied after the builtin static files,
-# so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['static']
-
-# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
-# using the given strftime format.
-#html_last_updated_fmt = '%b %d, %Y'
-
-# If true, SmartyPants will be used to convert quotes and dashes to
-# typographically correct entities.
-html_use_smartypants = True
-
-# Custom sidebar templates, maps document names to template names.
-#html_sidebars = {}
-
-# Additional templates that should be rendered to pages, maps page names to
-# template names.
-#html_additional_pages = {}
-
-# If false, no module index is generated.
-html_use_modindex = False
-
-# If false, no index is generated.
-html_use_index = False
-
-# If true, the index is split into individual pages for each letter.
-#html_split_index = False
-
-# If true, the reST sources are included in the HTML build as _sources/<name>.
-html_copy_source = False
-
-# If true, an OpenSearch description file will be output, and all pages will
-# contain a <link> tag referring to it.  The value of this option must be the
-# base URL from which the finished HTML is served.
-#html_use_opensearch = ''
-
-# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
-#html_file_suffix = ''
-
-# Output file base name for HTML help builder.
-htmlhelp_basename = 'LinkCheckerdoc'
-
-
-# Options for LaTeX output
-# ------------------------
-
-# The paper size ('letter' or 'a4').
-latex_paper_size = 'a4'
-
-# The font size ('10pt', '11pt' or '12pt').
-#latex_font_size = '10pt'
-
-# Grouping the document tree into LaTeX files. List of tuples
-# (source start file, target name, title, author, document class [howto/manual]).
-latex_documents = [
-  ('index', 'LinkChecker.tex', ur'LinkChecker Documentation',
-   ur'Bastian Kleineidam', 'manual'),
-]
-
-# The name of an image file (relative to this directory) to place at the top of
-# the title page.
-#latex_logo = None
-
-# For "manual" documents, if this is true, then toplevel headings are parts,
-# not chapters.
-#latex_use_parts = False
-
-# Additional stuff for the LaTeX preamble.
-#latex_preamble = ''
-
-# Documents to append as an appendix to all manuals.
-#latex_appendices = []
-
-# If false, no module index is generated.
-#latex_use_modindex = True
-
-#def setup(app):
-#    app.add_config_value('foo', 'default', True)
--- a/doc/source/documentation.txt
+++ b/doc/source/documentation.txt
@ -1,270 +0,0 @@
-Documentation
-=============
-
-Basic usage
-----------
-
-To check a URL like ``http://www.myhomepage.org/`` it is enough to
-execute ``linkchecker http://www.myhomepage.org/``. This will check the
-complete domain of www.myhomepage.org recursively. All links pointing
-outside of the domain are also checked for validity.
-
-Performed checks
----------------
-
-All URLs have to pass a preliminary syntax test. Minor quoting
-mistakes will issue a warning, all other invalid syntax issues
-are errors.
-After the syntax check passes, the URL is queued for connection
-checking. All connection check types are described below.
-
- HTTP links (``http:``, ``https:``)
-  
-  After connecting to the given HTTP server the given path
-  or query is requested. All redirections are followed, and
-  if user/password is given it will be used as authorization
-  when necessary.
-  Permanently moved pages issue a warning.
-  All final HTTP status codes other than 2xx are errors.
-
- Local files (``file:``)
-  
-  A regular, readable file that can be opened is valid. A readable
-  directory is also valid. All other files, for example device files,
-  unreadable or non-existing files are errors.
-  
-  File contents are checked for recursion.
-  
- Mail links (``mailto:``)
-  
-  A mailto: link eventually resolves to a list of email addresses.
-  If one address fails, the whole list will fail.
-  For each mail address we check the following things:
-  
-  1) Check the adress syntax, both of the part before and after
-     the @ sign.
-  2) Look up the MX DNS records. If we found no MX record,
-     print an error.
-  3) Check if one of the mail hosts accept an SMTP connection.
-     Check hosts with higher priority first.
-     If no host accepts SMTP, we print a warning.
-  4) Try to verify the address with the VRFY command. If we got
-     an answer, print the verified address as an info.
-
- FTP links (``ftp:``)
-  
-  For FTP links we do:
-  
-  1) connect to the specified host
-  2) try to login with the given user and password. The default
-     user is ``anonymous``, the default password is ``anonymous@``.
-  3) try to change to the given directory
-  4) list the file with the NLST command
-
- Telnet links (``telnet:``)
-  
-  We try to connect and if user/password are given, login to the
-  given telnet server.
-
- NNTP links (``news:``, ``snews:``, ``nntp``)
-  
-  We try to connect to the given NNTP server. If a news group or
-  article is specified, try to request it from the server.
-
- Ignored links (``javascript:``, etc.)
-  
-  An ignored link will only print a warning. No further checking
-  will be made.
-  
-  Here is a complete list of recognized, but ignored links. The most
-  prominent of them should be JavaScript links.
-  
-  - ``acap:``      (application configuration access protocol)
-  - ``afs:``       (Andrew File System global file names)
-  - ``chrome:``    (Mozilla specific)
-  - ``cid:``       (content identifier)
-  - ``clsid:``     (Microsoft specific)
-  - ``data:``      (data)
-  - ``dav:``       (dav)
-  - ``fax:``       (fax)
-  - ``find:``      (Mozilla specific)
-  - ``gopher:``    (Gopher)
-  - ``imap:``      (internet message access protocol)
-  - ``isbn:``      (ISBN (int. book numbers))
-  - ``javascript:`` (JavaScript)
-  - ``ldap:``      (Lightweight Directory Access Protocol)
-  - ``mailserver:`` (Access to data available from mail servers)
-  - ``mid:``       (message identifier)
-  - ``mms:``       (multimedia stream)
-  - ``modem:``     (modem)
-  - ``nfs:``       (network file system protocol)
-  - ``opaquelocktoken:`` (opaquelocktoken)
-  - ``pop:``       (Post Office Protocol v3)
-  - ``prospero:``  (Prospero Directory Service)
-  - ``rsync:``     (rsync protocol)
-  - ``rtsp:``      (real time streaming protocol)
-  - ``service:``   (service location)
-  - ``shttp:``     (secure HTTP)
-  - ``sip:``       (session initiation protocol)
-  - ``tel:``       (telephone)
-  - ``tip:``       (Transaction Internet Protocol)
-  - ``tn3270:``    (Interactive 3270 emulation sessions)
-  - ``vemmi:``     (versatile multimedia interface)
-  - ``wais:``      (Wide Area Information Servers)
-  - ``z39.50r:``   (Z39.50 Retrieval)
-  - ``z39.50s:``   (Z39.50 Session)
-
-
-Recursion
---------
-
-Before descending recursively into a URL, it has to fulfill several
-conditions. They are checked in this order:
-
-1. A URL must be valid.
-
-2. A URL must be parseable. This currently includes HTML files,
-   Opera bookmarks files, and directories. If a file type cannot
-   be determined (for example it does not have a common HTML file
-   extension, and the content does not look like HTML), it is assumed
-   to be non-parseable.
-
-3. The URL content must be retrievable. This is usually the case
-   except for example mailto: or unknown URL types.
-
-4. The maximum recursion level must not be exceeded. It is configured
-   with the ``--recursion-level`` option and is unlimited per default.
-
-5. It must not match the ignored URL list. This is controlled with
-   the ``--ignore-url`` option.
-
-6. The Robots Exclusion Protocol must allow links in the URL to be
-   followed recursively. This is checked by searching for a
-   "nofollow" directive in the HTML header data.
-
-Note that the directory recursion reads all files in that
-directory, not just a subset like ``index.htm*``.
-
-
-Frequently asked questions
--------------------------
-
-**Q: LinkChecker produced an error, but my web page is ok with
-Mozilla/IE/Opera/...
-Is this a bug in LinkChecker?**
-
-A: Please check your web pages first. Are they really ok?
-Use the ``--check-html`` option, or check if you are using a proxy
-which produces the error.
-
-**Q: I still get an error, but the page is definitely ok.**
-
-A: Some servers deny access of automated tools (also called robots)
-like LinkChecker. This is not a bug in LinkChecker but rather a
-policy by the webmaster running the website you are checking. Look
-the ``/robots.txt`` file which follows the `robots.txt exclusion standard`_.
-
-.. _`robots.txt exclusion standard`:
-   http://www.robotstxt.org/wc/norobots-rfc.html
-
-**Q: How can I tell LinkChecker which proxy to use?**
-
-A: LinkChecker works transparently with proxies. In a Unix or Windows
-environment, set the http_proxy, https_proxy, ftp_proxy environment
-variables to a URL that identifies the proxy server before starting
-LinkChecker. For example
-
-::
-
-  $ http_proxy="http://www.someproxy.com:3128"
-  $ export http_proxy
-
-
-**Q: The link "mailto:john@company.com?subject=Hello John" is reported
-as an error.**
-
-A: You have to quote special characters (e.g. spaces) in the subject field.
-The correct link should be "mailto:...?subject=Hello%20John"
-Unfortunately browsers like IE and Netscape do not enforce this.
-
-
-**Q: Has LinkChecker JavaScript support?**
-
-A: No, it never will. If your page is not working without JS, it is
-better checked with a browser testing tool like Selenium_.
-
-.. _Selenium:
-   http://seleniumhq.org/
-
-
-**Q: Is LinkCheckers cookie feature insecure?**
-
-A: Cookies can not store more information as is in the HTTP request itself,
-so you are not giving away any more system information.
-After storing however, the cookies are sent out to the server on request.
-Not to every server, but only to the one who the cookie originated from!
-This could be used to "track" subsequent requests to this server,
-and this is what some people annoys (including me).
-Cookies are only stored in memory. After LinkChecker finishes, they
-are lost. So the tracking is restricted to the checking time.
-The cookie feature is disabled as default.
-
-
-**Q: I want to have my own logging class. How can I use it in LinkChecker?**
-
-A: Currently, only a Python API lets you define new logging classes.
-Define your own logging class as a subclass of StandardLogger or any other
-logging class in the log module.
-Then call the addLogger function in Config.Configuration to register
-your new Logger.
-After this append a new Logging instance to the fileoutput.
-
-::
-
-  import linkcheck, MyLogger
-  log_format = 'mylog'
-  log_args = {'fileoutput': log_format, 'filename': 'foo.txt'}
-  cfg = linkcheck.configuration.Configuration()
-  cfg.logger_add(log_format, MyLogger.MyLogger)
-  cfg['fileoutput'].append(cfg.logger_new(log_format, log_args)) 
-
-
-**Q: LinkChecker does not ignore anchor references on caching.**
-
-**Q: Some links with anchors are getting checked twice.**
-
-A: This is not a bug.
-It is not necessarily true that if a URL ``ABC#anchor1`` works then
-``ABC#anchor2`` works too. That is not specified anywhere and there are
-server-side scripts that fail on some anchors and not on others.
-This is the reason for always checking URLs with different anchors.
-If you really want to disable this, use the ``--no-anchor-caching``
-option.
-
-
-**Q: I see LinkChecker gets a /robots.txt file for every site it
-checks. What is that about?**
-
-A: LinkChecker follows the `robots.txt exclusion standard`_. To avoid
-misuse of LinkChecker, you cannot turn this feature off.
-See the `Web Robot pages`_ and the `Spidering report`_ for more info.
-
-.. _`robots.txt exclusion standard`:
-   http://www.robotstxt.org/wc/norobots-rfc.html
-.. _`Web Robot pages`:
-   http://www.robotstxt.org/wc/robots.html
-.. _`Spidering report`:
-   http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt
-
-
-**Q: How do I print unreachable/dead documents of my website with
-LinkChecker?**
-
-A: No can do. This would require file system access to your web
-repository and access to your web server configuration.
-
-
-**Q: How do I check HTML/XML/CSS syntax with LinkChecker?**
-
-A: Use the ``--check-html`` and ``--check-css`` options.
-
--- a/doc/source/index.txt
+++ b/doc/source/index.txt
@ -1,49 +0,0 @@
-.. meta::
-   :keywords: link, URL, validation, checking
-
-===============================
-Check websites for broken links
-===============================
-
-LinkChecker is a free, GPL_ licensed URL validator.
-
-.. _GPL:
-   http://www.gnu.org/licenses/gpl-2.0.html
-
-If you like LinkChecker, consider a donation_ to improve it even
-more!
-
-.. _donation:
-   http://sourceforge.net/project/project_donations.php?group_id=1913
-
-Features
-========
-
- recursive and multithreaded checking
- output in colored or normal text, HTML, SQL, CSV, XML or a sitemap
-  graph in different formats
- HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file
-  links support
- restriction of link checking with regular expression filters for URLs
- proxy support
- username/password authorization for HTTP and FTP and Telnet
- honors robots.txt exclusion protocol
- Cookie support
- HTML and CSS syntax check
- Antivirus check
- a command line interface
- a GUI client interface
- a (Fast)CGI web interface (requires HTTP server)
-
-
-Screenshots
-===========
-
-   +------------------------------------+------------------------------------+------------------------------------+
-   | .. image:: shot1_thumb.jpg         | .. image:: shot2_thumb.jpg         | .. image:: shot3_thumb.jpg         |
-   |      :align: center                |      :align: center                |      :align: center                |
-   |      :target: _static/shot1.png    |      :target: _static/shot2.png    |      :target: _static/shot3.png    |
-   +------------------------------------+------------------------------------+------------------------------------+
-   | Commandline interface              | GUI client                         | Web interface                      |
-   +------------------------------------+------------------------------------+------------------------------------+
-
--- a/doc/source/other.txt
+++ b/doc/source/other.txt
@ -1,54 +0,0 @@
-Other link checkers
-===================
-
-If LinkChecker does not fit your requirements, you can check out the
-competition. All of these programs have also an `Open Source license`_
-like LinkChecker.
-
-.. _`Open Source license`:
-   http://www.opensource.org/licenses/
-
- `Checklinks`_ written in Perl
-  
-  .. _Checklinks:
-     http://www.jmarshall.com/tools/cl/
-
- `Dead link check`_ written in Perl
-  
-  .. _Dead link check:
-     http://dlc.sourceforge.net/
-
- `gURLChecker`_ written in C
-  
-  .. _gURLChecker:
-     http://labs.libre-entreprise.org/projects/gurlchecker/
-
- `KLinkStatus`_ written in C++
-
-  .. _KLinkStatus:
-     http://klinkstatus.kdewebdev.org/
-
- `link-checker`_ written in C
-  
-  .. _link-checker:
-     http://ymettier.free.fr/link-checker/link-checker.html
-
- `linklint`_ written in Perl
-  
-  .. _linklint:
-     http://www.linklint.org/
-
- `W3C Link Checker`_ HTML interface only
-
-  .. _W3C Link Checker:
-     http://validator.w3.org/checklink/
-
- `webcheck`_ written in Python
-  
-  .. _webcheck:
-     http://ch.tudelft.nl/~arthur/webcheck/
-
- `webgrep`_ written in Perl
-  
-  .. _webgrep:
-     http://cgi.linuxfocus.org/~guido/index.html#webgrep
--- a/doc/source/static/Makefile
+++ b/doc/source/static/Makefile
@ -1,3 +0,0 @@
-favicon.ico:	favicon32x32.png favicon16x16.png
-	png2ico favicon.ico favicon32x32.png favicon16x16.png
-
--- a/doc/source/templates/layout.html
+++ b/doc/source/templates/layout.html
@ -1,63 +0,0 @@
-{% extends "!layout.html" %}
-
-{% block extrahead %}
-<style type="text/css">
-img { border: 0; }
-</style>
-{% endblock %}
-
-{% block rootrellink %}
-  <li><a href="{{ pathto('index') }}">Home </a> |&nbsp;</li>
-  <li><a href="{{ pathto('documentation') }}">Documentation </a>|&nbsp;</li>
-  <li><a href="{{ pathto('other') }}">Other link checkers </a> </li>
-{% endblock %}
-
-{% block relbar1 %}
-<div style="background-color: white; text-align: left; padding: 10px 10px 15px 15px">
-{% if builder == 'html' %}
-<div style="float:right;"><a
-  href="http://sourceforge.net/projects/linkchecker"><img 
-  src="http://sflogo.sourceforge.net/sflogo.php?group_id=1913&type=13"
-  width="120" height="30" border="0"
-  alt="Get LinkChecker at SourceForge.net." /></a>
-{# Piwik tag #}
-<script type="text/javascript">
-var pkBaseURL = (("https:" == document.location.protocol) ? "https:" : "http:") + "//apps.sourceforge.net/piwik/linkchecker/";
-document.write(unescape("%3Cscript src='" + pkBaseURL + "piwik.js' type='text/javascript'%3E%3C/script%3E"));
-</script><script type="text/javascript">
-piwik_action_name = '';
-piwik_idsite = 1;
-piwik_url = pkBaseURL + "piwik.php";
-piwik_log(piwik_action_name, piwik_idsite, piwik_url);
-</script>
-<object><noscript><p><img src="http://apps.sourceforge.net/piwik/linkchecker/piwik.php?idsite=1" alt=""/></p></noscript></object>
-{# End Piwik tag #}
-</div>
-{% endif %}
-<table border="0"><tr>
- <td><a href="{{ pathto('index') }}"><img
-  src="{{ pathto("_static/logo64x64.png", 1) }}" border="0" alt="LinkChecker"/></a></td>
- <td><h1>LinkChecker</h1></td>
-</tr></table>
-</div>
-{{ super() }}
-{% endblock %}
-{% block relbar2 %}{% endblock %}
-
-{# put the sidebar before the body #}
-{% block sidebarsearch %}{{ super() }}{% endblock %}
-{% block sidebar1 %}{{ sidebar() }}{% endblock %}
-{% block sidebar2 %}{% endblock %}
-{% block sidebarlogo %}{% if builder == 'html' %}
-{% if pagename == 'index' %}
-<h3>Download</h3>
-<a href="http://prdownloads.sourceforge.net/linkchecker/LinkChecker-{{version}}.exe?download">LinkChecker&nbsp;{{version}}&nbsp;for&nbsp;Windows</a><br/>
-<a href="http://prdownloads.sourceforge.net/linkchecker/LinkChecker-{{version}}.tar.gz?download">LinkChecker&nbsp;{{version}}&nbsp;source</a><br/>
-<a href="http://linkchecker.git.sourceforge.net/git/gitweb.cgi?p=linkchecker;a=blob;f=ChangeLog.txt;hb=HEAD">Changelog</a><br/>
-<h3>Support</h3>
-<a href="http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913">Bug&nbsp;tracker</a><br/>
-<a href="http://sourceforge.net/scm/?type=git&group_id=1913">Development&nbsp;repository</a><br/>
-{% endif %}
-{% endif %}
-{% endblock %}
-{% block sidebartoc %}{% endblock %}