diff --git a/doc/en/documentation.html b/doc/en/documentation.html new file mode 100644 index 00000000..3c10f0e3 --- /dev/null +++ b/doc/en/documentation.html @@ -0,0 +1,303 @@ + + + + + + +Documentation + + + + + + + + + + + + + +
+

Documentation

+
+

Contents

+ +
+
+

Basic usage

+

To check an URL like http://www.myhomepage.org/ it is enough to +execute linkchecker http://www.myhomepage.org/. This will check the +complete domain of www.myhomepage.org recursively. All links pointing +outside of the domain are also checked for validity.

+

For more options, read the man page linkchecker(1) or execute +linkchecker -h.

+
+
+

Performed checks

+

All URLs have to pass a preliminary syntax test. Minor quoting +mistakes will issue a warning, all other invalid syntax issues +are errors. +After the syntax check passes, the URL is queued for connection +checking. All connection check types are described below.

+ +
+
+

Recursion

+

Recursion occurs on HTML files, Opera bookmark files and directories. +Note that the directory recursion reads all files in that +directory, not just a subset like index.htm*.

+
+
+

Frequently asked questions

+

Q: LinkChecker produced an error, but my web page is ok with +Netscape/IE/Opera/... +Is this a bug in LinkChecker?

+

A: Please check your web pages first. Are they really ok? Use +a syntax highlighting editor. Use HTML Tidy. +Check if you are using a proxy which produces the error.

+

Q: I still get an error, but the page is definitely ok.

+

A: Some servers deny access of automated tools (also called robots) +like LinkChecker. This is not a bug in LinkChecker but rather a +policy by the webmaster running the website you are checking. +It might even be possible for a website to send robots different +web pages than normal browsers.

+

Q: How can I tell LinkChecker which proxy to use?

+

A: LinkChecker works transparently with proxies. In a Unix or Windows +environment, set the http_proxy, https_proxy, ftp_proxy or gopher_proxy +environment variables to a URL that identifies the proxy server before +starting LinkChecker. For example

+
+$ http_proxy="http://www.someproxy.com:3128"
+$ export http_proxy
+
+

In a Macintosh environment, LinkChecker will retrieve proxy information +from Internet Config.

+

Q: The link "mailto:john@company.com?subject=Hello John" is reported +as an error.

+

A: You have to quote special characters (e.g. spaces) in the subject field. +The correct link should be "mailto:...?subject=Hello%20John" +Unfortunately browsers like IE and Netscape do not enforce this.

+

Q: Has LinkChecker JavaScript support?

+

A: No, it never will. If your page is not working without JS then your +web design is broken. +Use PHP or Zope or ASP for dynamic content, and use JavaScript just as +an addon for your web pages.

+

Q: I don't get this --extern/--intern stuff.

+

A: When it comes to checking there are three types of URLs. Note +that local files are also represented als URLs (ie file://). So +local files can be external URLs.

+
    +
  1. strict external URLs: +We do only syntax checking. Internal URLs are never strict.
  2. +
  3. external URLs: +Like 1), but we additionally check if they are valid by connect()ing +to them
  4. +
  5. internal URLs: +Like 2), but we additionally check if they are HTML pages and if so, +we descend recursively into this link and check all the links in the +HTML content. +The --recursion-level option restricts the number of such recursive +descends.
  6. +
+

LinkChecker provides four options which affect URLs to fall in one +of those three categories: --intern, --extern, --extern-strict-all and +--denyallow. +By default all URLs are internal. With --extern you specify what URLs +are external. With --intern you specify what URLs are internal. +Now imagine you have both --extern and --intern. What happens +when an URL matches both patterns? Or when it matches none? In this +situation the --denyallow option specifies the order in which we match +the URL. By default it is internal/external, with --denyallow the order is +external/internal. Either way, the first match counts, and if none matches, +the last checked category is the category for the URL. +Finally, with --extern-strict-all all external URLs are strict.

+

Oh, and just to boggle your mind: you can have more than one external +regular expression in a config file and for each of those expressions +you can specify if those matched external URLs should be strict or not.

+

An example. We don't want to check mailto urls. Then its +-i'!^mailto:'. The '!' negates an expression. With --extern-strictall, +we don't even connect to any mail hosts.

+

Another example. We check our site www.mycompany.com, don't recurse +into external links point outside from our site and want to ignore links +to hollowood.com and hullabulla.com completely. +This can only be done with a configuration entry like

+
+[filtering]
+extern1=hollowood.com 1
+extern2=hullabulla.com 1
+# the 1 means strict external ie don't even connect
+
+

and the command +linkchecker --intern=www.mycompany.com www.mycompany.com

+

Q: Is LinkCheckers cookie feature insecure?

+

A: Cookies can not store more information as is in the HTTP request itself, +so you are not giving away any more system information. +After storing however, the cookies are sent out to the server on request. +Not to every server, but only to the one who the cookie originated from! +This could be used to "track" subsequent requests to this server, +and this is what some people annoys (including me). +Cookies are only stored in memory. After LinkChecker finishes, they +are lost. So the tracking is restricted to the checking time. +The cookie feature is disabled as default.

+

Q: I want to have my own logging class. How can I use it in LinkChecker?

+

A: Currently, only a Python API lets you define new logging classes. +Define your own logging class as a subclass of StandardLogger or any other +logging class in the log module. +Then call the addLogger function in Config.Configuration to register +your new Logger. +After this append a new Logging instance to the fileoutput.

+
+import linkcheck, MyLogger
+log_format = 'mylog'
+log_args = {'fileoutput': log_format, 'filename': 'foo.txt'}
+cfg = linkcheck.configuration.Configuration()
+cfg.logger_add(log_format, MyLogger.MyLogger)
+cfg['fileoutput'].append(cfg.logger_new(log_format, log_args)) 
+
+

Q: LinkChecker does not ignore anchor references on caching.

+

Q: Some links with anchors are getting checked twice.

+

A: This is not a bug. +It is common practice to believe that if an URL ABC#anchor1 works then +ABC#anchor2 works too. That is not specified anywhere and I have seen +server-side scripts that fail on some anchors and not on others. +This is the reason for always checking URLs with different anchors. +If you really want to disable this, use the --no-anchor-caching +option.

+

Q: I see LinkChecker gets a /robots.txt file for every site it +checks. What is that about?

+

A: LinkChecker follows the robots.txt exclusion standard. To avoid +misuse of LinkChecker, you cannot turn this feature off. +See the Web Robot pages and the Spidering report for more info.

+

Q: Ctrl-C does not stop LinkChecker immediately. Why is that so?

+

A: The Python interpreter has to wait for all threads to finish, and +this means waiting for all open sockets to close. The default timeout +for sockets is 30 seconds, hence the delay. +You can change the default socket timeout with the --timeout option.

+

Q: How do I print unreachable/dead documents of my website with +LinkChecker?

+

A: No can do. This would require file system access to your web +repository and access to your web server configuration.

+

You can instead store the linkchecker results in a database +and look for missing files.

+

Q: How do I check HTML/XML syntax with LinkChecker?

+

A: No can do. Use the HTML Tidy program.

+
+
+ + + + diff --git a/doc/en/documentation.nav b/doc/en/documentation.nav new file mode 100644 index 00000000..2999cda6 --- /dev/null +++ b/doc/en/documentation.nav @@ -0,0 +1,5 @@ +# generated by htmlnav.py, do not edit +name = u'Documentation' +level = 0 +visible = True +order = 3 diff --git a/doc/en/documentation.txt b/doc/en/documentation.txt new file mode 100644 index 00000000..6cd9c12b --- /dev/null +++ b/doc/en/documentation.txt @@ -0,0 +1,336 @@ +.. meta:: + :navigation.order: 3 + :navigation.name: Documentation + +Documentation +============= + +.. contents:: + +Basic usage +----------- + +To check an URL like ``http://www.myhomepage.org/`` it is enough to +execute ``linkchecker http://www.myhomepage.org/``. This will check the +complete domain of www.myhomepage.org recursively. All links pointing +outside of the domain are also checked for validity. + +For more options, read the man page ``linkchecker(1)`` or execute +``linkchecker -h``. + + +Performed checks +---------------- + +All URLs have to pass a preliminary syntax test. Minor quoting +mistakes will issue a warning, all other invalid syntax issues +are errors. +After the syntax check passes, the URL is queued for connection +checking. All connection check types are described below. + +- HTTP links (``http:``, ``https:``) + + After connecting to the given HTTP server the given path + or query is requested. All redirections are followed, and + if user/password is given it will be used as authorization + when necessary. + Permanently moved pages issue a warning. + All final HTTP status codes other than 2xx are errors. + +- Local files (``file:``) + + A regular, readable file that can be opened is valid. A readable + directory is also valid. All other files, for example device files, + unreadable or non-existing files are errors. + + File contents are checked for recursion. + +- Mail links (``mailto:``) + + A mailto: link eventually resolves to a list of email addresses. + If one address fails, the whole list will fail. + For each mail address we check the following things: + + 1) Look up the MX DNS records. If we found no MX record, + print an error. + 2) Check if one of the mail hosts accept an SMTP connection. + Check hosts with higher priority first. + If no host accepts SMTP, we print a warning. + 3) Try to verify the address with the VRFY command. If we got + an answer, print the verified address as an info. + +- FTP links (``ftp:``) + + For FTP links we do: + + 1) connect to the specified host + 2) try to login with the given user and password. The default + user is ``anonymous``, the default password is ``anonymous@``. + 3) try to change to the given directory + 4) list the file with the NLST command + +- Gopher links (``gopher:``) + + We try to send the given selector (or query) to the gopher server. + +- Telnet links (``telnet:``) + + We try to connect and if user/password are given, login to the + given telnet server. + +- NNTP links (``news:``, ``snews:``, ``nntp``) + + We try to connect to the given NNTP server. If a news group or + article is specified, try to request it from the server. + +- Ignored links (``javascript:``, etc.) + + An ignored link will only print a warning. No further checking + will be made. + + Here is a complete list of recognized, but ignored links. The most + prominent of them should be JavaScript links. + + - ``acap:`` (application configuration access protocol) + - ``afs:`` (Andrew File System global file names) + - ``chrome:`` (Mozilla specific) + - ``cid:`` (content identifier) + - ``clsid:`` (Microsoft specific) + - ``data:`` (data) + - ``dav:`` (dav) + - ``fax:`` (fax) + - ``find:`` (Mozilla specific) + - ``imap:`` (internet message access protocol) + - ``isbn:`` (ISBN (int. book numbers)) + - ``javascript:`` (JavaScript) + - ``ldap:`` (Lightweight Directory Access Protocol) + - ``mailserver:`` (Access to data available from mail servers) + - ``mid:`` (message identifier) + - ``mms:`` (multimedia stream) + - ``modem:`` (modem) + - ``nfs:`` (network file system protocol) + - ``opaquelocktoken:`` (opaquelocktoken) + - ``pop:`` (Post Office Protocol v3) + - ``prospero:`` (Prospero Directory Service) + - ``rsync:`` (rsync protocol) + - ``rtsp:`` (real time streaming protocol) + - ``service:`` (service location) + - ``shttp:`` (secure HTTP) + - ``sip:`` (session initiation protocol) + - ``tel:`` (telephone) + - ``tip:`` (Transaction Internet Protocol) + - ``tn3270:`` (Interactive 3270 emulation sessions) + - ``vemmi:`` (versatile multimedia interface) + - ``wais:`` (Wide Area Information Servers) + - ``z39.50r:`` (Z39.50 Retrieval) + - ``z39.50s:`` (Z39.50 Session) + + +Recursion +--------- + +Recursion occurs on HTML files, Opera bookmark files and directories. +Note that the directory recursion reads all files in that +directory, not just a subset like ``index.htm*``. + +.. meta:: + :navigation.order: 4 + :navigation.name: FAQ + + +Frequently asked questions +-------------------------- + +**Q: LinkChecker produced an error, but my web page is ok with +Netscape/IE/Opera/... +Is this a bug in LinkChecker?** + +A: Please check your web pages first. Are they really ok? Use +a `syntax highlighting editor`_. Use `HTML Tidy`_. +Check if you are using a proxy which produces the error. + +.. _`syntax highlighting editor`: + http://fte.sourceforge.net/ +.. _`HTML Tidy`: + http://tidy.sourceforge.net/ + + +**Q: I still get an error, but the page is definitely ok.** + +A: Some servers deny access of automated tools (also called robots) +like LinkChecker. This is not a bug in LinkChecker but rather a +policy by the webmaster running the website you are checking. +It might even be possible for a website to send robots different +web pages than normal browsers. + + +**Q: How can I tell LinkChecker which proxy to use?** + +A: LinkChecker works transparently with proxies. In a Unix or Windows +environment, set the http_proxy, https_proxy, ftp_proxy or gopher_proxy +environment variables to a URL that identifies the proxy server before +starting LinkChecker. For example + +:: + + $ http_proxy="http://www.someproxy.com:3128" + $ export http_proxy + +In a Macintosh environment, LinkChecker will retrieve proxy information +from Internet Config. + + +**Q: The link "mailto:john@company.com?subject=Hello John" is reported +as an error.** + +A: You have to quote special characters (e.g. spaces) in the subject field. +The correct link should be "mailto:...?subject=Hello%20John" +Unfortunately browsers like IE and Netscape do not enforce this. + + +**Q: Has LinkChecker JavaScript support?** + +A: No, it never will. If your page is not working without JS then your +web design is broken. +Use PHP or Zope or ASP for dynamic content, and use JavaScript just as +an addon for your web pages. + + +**Q: I don't get this --extern/--intern stuff.** + +A: When it comes to checking there are three types of URLs. Note +that local files are also represented als URLs (ie file://). So +local files can be external URLs. + +1) strict external URLs: + We do only syntax checking. Internal URLs are never strict. +2) external URLs: + Like 1), but we additionally check if they are valid by connect()ing + to them +3) internal URLs: + Like 2), but we additionally check if they are HTML pages and if so, + we descend recursively into this link and check all the links in the + HTML content. + The --recursion-level option restricts the number of such recursive + descends. + +LinkChecker provides four options which affect URLs to fall in one +of those three categories: --intern, --extern, --extern-strict-all and +--denyallow. +By default all URLs are internal. With --extern you specify what URLs +are external. With --intern you specify what URLs are internal. +Now imagine you have both --extern and --intern. What happens +when an URL matches both patterns? Or when it matches none? In this +situation the --denyallow option specifies the order in which we match +the URL. By default it is internal/external, with --denyallow the order is +external/internal. Either way, the first match counts, and if none matches, +the last checked category is the category for the URL. +Finally, with --extern-strict-all all external URLs are strict. + +Oh, and just to boggle your mind: you can have more than one external +regular expression in a config file and for each of those expressions +you can specify if those matched external URLs should be strict or not. + +An example. We don't want to check mailto urls. Then its +-i'!^mailto:'. The '!' negates an expression. With --extern-strictall, +we don't even connect to any mail hosts. + +Another example. We check our site www.mycompany.com, don't recurse +into external links point outside from our site and want to ignore links +to hollowood.com and hullabulla.com completely. +This can only be done with a configuration entry like + +:: + + [filtering] + extern1=hollowood.com 1 + extern2=hullabulla.com 1 + # the 1 means strict external ie don't even connect + +and the command +``linkchecker --intern=www.mycompany.com www.mycompany.com`` + + +**Q: Is LinkCheckers cookie feature insecure?** + +A: Cookies can not store more information as is in the HTTP request itself, +so you are not giving away any more system information. +After storing however, the cookies are sent out to the server on request. +Not to every server, but only to the one who the cookie originated from! +This could be used to "track" subsequent requests to this server, +and this is what some people annoys (including me). +Cookies are only stored in memory. After LinkChecker finishes, they +are lost. So the tracking is restricted to the checking time. +The cookie feature is disabled as default. + + +**Q: I want to have my own logging class. How can I use it in LinkChecker?** + +A: Currently, only a Python API lets you define new logging classes. +Define your own logging class as a subclass of StandardLogger or any other +logging class in the log module. +Then call the addLogger function in Config.Configuration to register +your new Logger. +After this append a new Logging instance to the fileoutput. + +:: + + import linkcheck, MyLogger + log_format = 'mylog' + log_args = {'fileoutput': log_format, 'filename': 'foo.txt'} + cfg = linkcheck.configuration.Configuration() + cfg.logger_add(log_format, MyLogger.MyLogger) + cfg['fileoutput'].append(cfg.logger_new(log_format, log_args)) + + +**Q: LinkChecker does not ignore anchor references on caching.** + +**Q: Some links with anchors are getting checked twice.** + +A: This is not a bug. +It is common practice to believe that if an URL ``ABC#anchor1`` works then +``ABC#anchor2`` works too. That is not specified anywhere and I have seen +server-side scripts that fail on some anchors and not on others. +This is the reason for always checking URLs with different anchors. +If you really want to disable this, use the ``--no-anchor-caching`` +option. + + +**Q: I see LinkChecker gets a /robots.txt file for every site it +checks. What is that about?** + +A: LinkChecker follows the robots.txt exclusion standard. To avoid +misuse of LinkChecker, you cannot turn this feature off. +See the `Web Robot pages`_ and the `Spidering report`_ for more info. + +.. _`Web Robot pages`: + http://www.robotstxt.org/wc/robots.html +.. _`Spidering report`: + http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt + + +**Q: Ctrl-C does not stop LinkChecker immediately. Why is that so?** + +A: The Python interpreter has to wait for all threads to finish, and +this means waiting for all open sockets to close. The default timeout +for sockets is 30 seconds, hence the delay. +You can change the default socket timeout with the --timeout option. + + +**Q: How do I print unreachable/dead documents of my website with +LinkChecker?** + +A: No can do. This would require file system access to your web +repository and access to your web server configuration. + +You can instead store the linkchecker results in a database +and look for missing files. + + +**Q: How do I check HTML/XML syntax with LinkChecker?** + +A: No can do. Use the `HTML Tidy`_ program. + +.. _`HTML Tidy`: + http://tidy.sourceforge.net/ + diff --git a/doc/en/index.html b/doc/en/index.html new file mode 100644 index 00000000..4e50c4b5 --- /dev/null +++ b/doc/en/index.html @@ -0,0 +1,150 @@ + + + + + + +LinkChecker - check HTML documents for broken links + + + + + + + + + + + + + + + + diff --git a/doc/en/index.nav b/doc/en/index.nav new file mode 100644 index 00000000..cd3266ad --- /dev/null +++ b/doc/en/index.nav @@ -0,0 +1,5 @@ +# generated by htmlnav.py, do not edit +name = u'LinkChecker' +level = 0 +visible = True +order = 0 diff --git a/doc/en/index.txt b/doc/en/index.txt new file mode 100644 index 00000000..09bbfcd2 --- /dev/null +++ b/doc/en/index.txt @@ -0,0 +1,128 @@ +.. meta:: + :navigation.order: 0 + :navigation.name: LinkChecker + +=================================================== +LinkChecker - check HTML documents for broken links +=================================================== + +.. contents:: + +Features +======== + +- recursive checking +- multithreading +- output in colored or normal text, HTML, SQL, CSV or a sitemap + graph in GML or XML. +- HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Gopher, Telnet and local + file links support +- restriction of link checking with regular expression filters for URLs +- proxy support +- username/password authorization for HTTP and FTP +- robots.txt exclusion protocol support +- i18n support +- a command line interface +- a (Fast)CGI web interface (requires HTTP server) + + +Download +======== + +Download the latest packages from `LinkChecker download section`_. +There are also `Md5sum checksums`_ from above files. + +.. _LinkChecker download section: + http://sourceforge.net/project/showfiles.php?group_id=1913 +.. _Md5sum checksums: + linkchecker-md5sums.txt + +Requirements and installation instructions are located at the +`install documentation`_. To see what has changed between releases +look at the ChangeLog_. + +.. _install documentation: + install.html +.. _ChangeLog: + http://cvs.sourceforge.net/viewcvs.py/linkchecker/linkchecker/ChangeLog?view=markup + + +Screenshots +=========== + + +----------------------------+----------------------------+ + | .. image:: shot1_thumb.jpg | .. image:: shot2_thumb.jpg | + | :align: middle | :align: middle | + | :target: shot1.png | :target: shot2.png | + +----------------------------+----------------------------+ + | Commandline interface | Web interface | + +----------------------------+----------------------------+ + + +Running +======= + +Running under Unix or Mac OS X platforms +---------------------------------------- + +The local configuration file is $HOME/.linkcheckerrc +Type "linkchecker" followed by your URLs you want to check. +Type "linkchecker -h" for help. + +Running under Windows platforms +------------------------------- + +Start "Check URL" in your LinkChecker program group. +URL input is interactive. +Another way is executing "python.exe linkchecker" in the Python +Scripts directory. + +Running under Mac OS 9.x platforms +---------------------------------- + +Read the MacOS Python documentation to find out about passing +commandline options to Python scripts. + + +Internationalization +-------------------- +For german output execute "export LC_MESSAGES=de" in bash or +"setenv LC_MESSAGES de" in tcsh. +Under Windows, execute "set LC_MESSAGES=de". +Other supported languages are 'nl' (Nederlands) and 'fr' (français). + +You can help to translate LinkChecker by copying the included +``linkchecker.pot`` file to ``language.po``, translate it and +send it to me. + + +Bug reporting +============= + +The `SourceForge Bug interface`_ allows submitting of bugs, patches +and requests. + +.. _SourceForge Bug interface: + http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913 + + +CVS access +========== + +The `SourceForge CVS page`_ has all the information on how to +obtain the development version of LinkChecker. Development of +LinkChecker requires some more software to be available, which +is documented on the `installation page`. + +.. _SourceForge CVS page: + http://sourceforge.net/cvs/?group_id=1913 +.. _installation page: + install.html + + +.. image:: http://sourceforge.net/sflogo.php?group_id=1913&type=1 + :align: right + :target: http://sourceforge.net/ + :alt: SourceForge Logo + :width: 88 + :height: 31 diff --git a/doc/en/install.html b/doc/en/install.html new file mode 100644 index 00000000..cc934162 --- /dev/null +++ b/doc/en/install.html @@ -0,0 +1,207 @@ + + + + + + +Installation + + + + + + + + + + + +
+

Installation

+

If you are upgrading from older versions of LinkChecker you should +also read the upgrading documentation.

+
+

Requirements for Unix/Linux or Mac OS X

+
    +
  1. You need a standard GNU development environment with

    +
      +
    • C compiler (for example the GNU C Compiler gcc)

      +

      Depending on your distribution, several development packages +might be needed to provide a fully functional C development +environment.

      +
    • +
    +

    Note for developers: if you want to regenerate the po/linkchecker.pot +template from the source files, you will need xgettext with Python +support. This is available in gettext >= 0.12.

    +
  2. +
  3. Python >= 2.4 from http://www.python.org/ with zlib support

    +

    Be sure to also have installed the included distutils module. +On most distributions, the distutils module is included in +an extra "python-dev" package.

    +
  4. +
  5. Optional, for bash-completion: +optcomplete Python module from http://furius.ca/optcomplete/

    +
  6. +
  7. Optional (speedup for i386 compatible PCs) +Psyco from http://psyco.sourceforge.net/ +[http://osdn.dl.sourceforge.net/sourceforge/psyco/psyco-1.4-src.tar.gz]

    +
  8. +
+
+
+

Requirements for Windows

+
    +
  1. Install Python >= 2.4 from http://www.python.org/ +[http://www.python.org/ftp/python/2.4/python-2.4.msi]
  2. +
  3. Only needed if you compile from source: +install the MinGW suite from http://mingw.sourceforge.net/ +Be sure to install in the given order:
      +
    1. MingGW +[http://osdn.dl.sourceforge.net/sourceforge/mingw/MinGW-3.1.0-1.exe]
    2. +
    3. MSYS +[http://osdn.dl.sourceforge.net/sourceforge/mingw/MSYS-1.0.10.exe]
    4. +
    +
  4. +
  5. Optional (speedup for i386 compatible PCs) +Psyco from http://psyco.sourceforge.net/ +[http://osdn.dl.sourceforge.net/sourceforge/psyco/psyco-1.4.win32-py2.4.exe]
  6. +
+
+
+

Setup for Unix/Linux or Mac OS X

+
    +
  1. Install check

    +

    Be sure to have installed all required Unix/Linux software listed above.

    +
  2. +
  3. Compile Python modules

    +

    Run python setup.py build to compile the Python files. +For help about the setup.py script options, run +python setup.py --help. +The CC environment variable is checked before compilation, so you can +change the default C compiler with export CC=myccompiler.

    +
  4. +
  5. +
      +
    1. Installation as root

      +

      Run su -c 'python setup.py install' to install LinkChecker.

      +
    2. +
    3. Installation as a normal user

      +

      Run python setup.py install --home $HOME. Note that you have +to adjust your PATH and PYTHONPATH environment variables, eg. by +adding the commands export PYTHONPATH=$HOME/lib/python and +export PATH=$PATH:$HOME/bin to your shell configuration +file.

      +

      For more information look at the Modifying Python's search path +documentation.

      +
    4. +
    +

    If you downloaded Psyco please read the psyco installation docs.

    +
    +
  6. +
+
+
+

Setup for Windows - the binary .exe installer:

+
    +
  1. Install check

    +

    Be sure to have installed all required windows software listed above.

    +
  2. +
  3. Execute the linkchecker-x.xx.win32-py2.4.exe file and follow +the instructions.

    +
  4. +
+
+
+

Setup for Windows - compiling from source:

+
    +
  1. Install check

    +

    Be sure to have installed all required Unix/Linux software listed above.

    +
  2. +
  3. Preparing Python for the MinGW compiler

    +

    Search the file python24.dll in your windows folder. +After you found it, launch MSYS. Change into the windows folder, +for example cd c:\winnt\system32. Then execute +pexports python24.dll > python24.def. +Then use the dlltool with +dlltool --dllname python24.dll --def python24.def --output-lib +libpython24.a. +The resulting library has to be placed in the same directory as +python24.lib. (Should be the libs directory under your Python installation +directory, for example c:\Python24\Libs\.)

    +
  4. +
  5. Generate and execute the LinkChecker installer

    +

    Close the MSYS application (by typing exit) and open a DOS command +prompt. +Change to the linkchecker-X.X.X directory and run +python setup.py build -c mingw32 bdist_wininst.

    +

    This generates a binary installer +dist\linkchecker-X.X.X.win32-py2.4.exe which you just have to +execute.

    +

    If you downloaded Psyco please read the psyco installation docs.

    +
  6. +
+
+
+

After installation

+

LinkChecker is now installed. Have fun! +See the main page on how to configure and start LinkChecker.

+
+
+

Installation for other platforms

+

If you happen to install LinkChecker on other platforms (for example +Mac OS 9.x) then drop me a note.

+
+
+

(Fast)CGI web interface

+

The included CGI scripts can run LinkChecker with a nice graphical web +interface. +You can use and adjust the example HTML files in the lconline directory +to run the script.

+
    +
  1. Choose a CGI script. The simplest is lc.cgi and you need a web server +with CGI support. +The script lc.fcgi (I tested this a while ago) needs a web server +with FastCGI support.
  2. +
  3. Copy the script of your choice in the CGI directory. +Note that only the local host (ie. 127.0.0.1) can access this +script. If you want to enable access from other hosts you have +to adjust the ALLOWED_HOSTS and ALLOWED_SERVERS variables in +the lc.cgi (or lc.fcgi) script.
  4. +
  5. Adjust the "action=..." parameter in lconline/lc_cgi.html +to point to your CGI script.
  6. +
  7. load the lconline/index.html file, enter an URL and klick on the +check button
  8. +
  9. If something goes wrong, check the following:
      +
    1. look in the error log of your web server
    2. +
    3. be sure that you have enabled CGI support in your web server +do this by running other CGI scripts which you know are +working
    4. +
    5. try to run the lc.cgi script by hand
    6. +
    7. try the testit() function in the lc.cgi script
    8. +
    +
  10. +
+
+
+ + + + diff --git a/doc/en/install.nav b/doc/en/install.nav new file mode 100644 index 00000000..f5725612 --- /dev/null +++ b/doc/en/install.nav @@ -0,0 +1,5 @@ +# generated by htmlnav.py, do not edit +name = u'Installation' +level = 0 +visible = True +order = 1 diff --git a/doc/en/install.txt b/doc/en/install.txt new file mode 100644 index 00000000..2baa53cf --- /dev/null +++ b/doc/en/install.txt @@ -0,0 +1,194 @@ +.. meta:: + :navigation.order: 1 + :navigation.name: Installation + +Installation +============ + +If you are upgrading from older versions of LinkChecker you should +also read the `upgrading documentation`_. + +.. _upgrading documentation: + upgrading.html + + +Requirements for Unix/Linux or Mac OS X +--------------------------------------- + +1. You need a standard GNU development environment with + + - C compiler (for example the GNU C Compiler gcc) + + Depending on your distribution, several development packages + might be needed to provide a fully functional C development + environment. + + Note for developers: if you want to regenerate the po/linkchecker.pot + template from the source files, you will need xgettext with Python + support. This is available in gettext >= 0.12. + +2. Python >= 2.4 from http://www.python.org/ with zlib support + + Be sure to also have installed the included distutils module. + On most distributions, the distutils module is included in + an extra "python-dev" package. + +3. *Optional, for bash-completion:* + optcomplete Python module from http://furius.ca/optcomplete/ + +4. *Optional (speedup for i386 compatible PCs)* + Psyco from http://psyco.sourceforge.net/ + [http://osdn.dl.sourceforge.net/sourceforge/psyco/psyco-1.4-src.tar.gz] + +Requirements for Windows +------------------------ + +1. Install Python >= 2.4 from http://www.python.org/ + [http://www.python.org/ftp/python/2.4/python-2.4.msi] + +2. *Only needed if you compile from source:* + install the MinGW suite from http://mingw.sourceforge.net/ + Be sure to install in the given order: + + a) MingGW + [http://osdn.dl.sourceforge.net/sourceforge/mingw/MinGW-3.1.0-1.exe] + b) MSYS + [http://osdn.dl.sourceforge.net/sourceforge/mingw/MSYS-1.0.10.exe] + +3. *Optional (speedup for i386 compatible PCs)* + Psyco from http://psyco.sourceforge.net/ + [http://osdn.dl.sourceforge.net/sourceforge/psyco/psyco-1.4.win32-py2.4.exe] + +Setup for Unix/Linux or Mac OS X +-------------------------------- + +1. Install check + + Be sure to have installed all required Unix/Linux software listed above. + +2. Compile Python modules + + Run ``python setup.py build`` to compile the Python files. + For help about the setup.py script options, run + ``python setup.py --help``. + The CC environment variable is checked before compilation, so you can + change the default C compiler with ``export CC=myccompiler``. + +3. + a) Installation as root + + Run ``su -c 'python setup.py install'`` to install LinkChecker. + + b) Installation as a normal user + + Run ``python setup.py install --home $HOME``. Note that you have + to adjust your PATH and PYTHONPATH environment variables, eg. by + adding the commands ``export PYTHONPATH=$HOME/lib/python`` and + ``export PATH=$PATH:$HOME/bin`` to your shell configuration + file. + + For more information look at the `Modifying Python's search path`_ + documentation. + + .. _Modifying Python's search path: + http://docs.python.org/inst/search-path.html#SECTION000410000000000000000 + + If you downloaded Psyco please read the `psyco installation docs`_. + + .. _psyco installation docs: + http://psyco.sourceforge.net/psycoguide/node2.html + +Setup for Windows - the binary .exe installer: +---------------------------------------------- + +1. Install check + + Be sure to have installed all required windows software listed above. + +2. Execute the ``linkchecker-x.xx.win32-py2.4.exe`` file and follow + the instructions. + +Setup for Windows - compiling from source: +------------------------------------------ + +1. Install check + + Be sure to have installed all required Unix/Linux software listed above. + +2. Preparing Python for the MinGW compiler + + Search the file python24.dll in your windows folder. + After you found it, launch MSYS. Change into the windows folder, + for example ``cd c:\winnt\system32``. Then execute + ``pexports python24.dll > python24.def``. + Then use the dlltool with + ``dlltool --dllname python24.dll --def python24.def --output-lib + libpython24.a``. + The resulting library has to be placed in the same directory as + python24.lib. (Should be the libs directory under your Python installation + directory, for example ``c:\Python24\Libs\``.) + +3. Generate and execute the LinkChecker installer + + Close the MSYS application (by typing ``exit``) and open a DOS command + prompt. + Change to the ``linkchecker-X.X.X`` directory and run + ``python setup.py build -c mingw32 bdist_wininst``. + + This generates a binary installer + ``dist\linkchecker-X.X.X.win32-py2.4.exe`` which you just have to + execute. + + If you downloaded Psyco please read the `psyco installation docs`_. + + .. _psyco installation docs: + http://psyco.sourceforge.net/psycoguide/node2.html + +After installation +------------------ + +LinkChecker is now installed. Have fun! +See the `main page`_ on how to configure and start LinkChecker. + +.. _main page: index.html + +Installation for other platforms +-------------------------------- + +If you happen to install LinkChecker on other platforms (for example +Mac OS 9.x) then drop me a note. + +(Fast)CGI web interface +----------------------- + +The included CGI scripts can run LinkChecker with a nice graphical web +interface. +You can use and adjust the example HTML files in the lconline directory +to run the script. + +1. Choose a CGI script. The simplest is lc.cgi and you need a web server + with CGI support. + The script lc.fcgi (I tested this a while ago) needs a web server + with FastCGI support. + +2. Copy the script of your choice in the CGI directory. + Note that only the local host (ie. 127.0.0.1) can access this + script. If you want to enable access from other hosts you have + to adjust the ALLOWED_HOSTS and ALLOWED_SERVERS variables in + the lc.cgi (or lc.fcgi) script. + +3. Adjust the "action=..." parameter in lconline/lc_cgi.html + to point to your CGI script. + +4. load the lconline/index.html file, enter an URL and klick on the + check button + +5. If something goes wrong, check the following: + + a) look in the error log of your web server + b) be sure that you have enabled CGI support in your web server + do this by running other CGI scripts which you know are + working + c) try to run the lc.cgi script by hand + d) try the testit() function in the lc.cgi script + diff --git a/doc/en/lc.css b/doc/en/lc.css new file mode 100644 index 00000000..b28833c4 --- /dev/null +++ b/doc/en/lc.css @@ -0,0 +1,263 @@ +/* +:Author: David Goodger +:Contact: goodger@users.sourceforge.net +:date: $Date$ +:version: $Revision$ +:copyright: This stylesheet has been placed in the public domain. + +Default cascading style sheet for the HTML output of Docutils. +*/ + + +body { + font-family: Verdana, Helvetica, Arial, sans-serif; + background: #fff7ee;/*#f7ebd3;*//*fdf9f4*/ + margin: 0; + padding: 0; +} + +img { + border: 0; +} + +.first { + margin-top: 0 } + +.last { + margin-bottom: 0 } + +a { + color: #222222; +} + +a:hover { + color: black; +} + +a.toc-backref { + text-decoration: none ; + color: black; +} + +div.document { + margin-left: 2em; + width:680px; + overflow: visible; +} + +blockquote.epigraph { + margin: 2em 5em ; } + +dd { + margin-bottom: 0.5em } + +div.abstract { + margin: 2em 5em; +} + +div.abstract p.topic-title { + font-weight: bold ; + text-align: center } + +div.attention, div.caution, div.danger, div.error, div.hint, +div.important, div.note, div.tip, div.warning, div.admonition { + margin: 2em ; + border: medium outset ; + padding: 1em } + +div.attention p.admonition-title, div.caution p.admonition-title, +div.danger p.admonition-title, div.error p.admonition-title, +div.warning p.admonition-title { + color: red ; + font-weight: bold ; + font-family: sans-serif } + +div.hint p.admonition-title, div.important p.admonition-title, +div.note p.admonition-title, div.tip p.admonition-title, +div.admonition p.admonition-title { + font-weight: bold ; + font-family: sans-serif } + +div.dedication { + margin: 2em 5em ; + text-align: center ; + font-style: italic } + +div.dedication p.topic-title { + font-weight: bold ; + font-style: normal } + +div.figure { + margin-left: 2em } + +div.footer, div.header { + font-size: smaller; + margin-left: 2em; + margin-bottom: 1em; +} + +hr.footer { + width: 680px; + margin-left: 2em; +} + +div.sidebar { + margin-left: 1em ; + border: medium outset ; + padding: 0em 1em ; + background-color: #ffffee ; + width: 40% ; + float: right ; + clear: right } + +div.sidebar p.rubric { + font-family: sans-serif ; + font-size: medium } + +div.system-messages { + margin: 5em } + +div.system-messages h1 { + color: red } + +div.system-message { + border: medium outset ; + padding: 1em } + +div.system-message p.system-message-title { + color: red ; + font-weight: bold } + +div.topic { + margin: 2em } + +h1.title { + text-align: center } + +h2.subtitle { + text-align: center } + +hr { + width: 75% } + +ol.simple, ul.simple { + margin-bottom: 1em } + +ol.arabic { + list-style: decimal } + +ol.loweralpha { + list-style: lower-alpha } + +ol.upperalpha { + list-style: upper-alpha } + +ol.lowerroman { + list-style: lower-roman } + +ol.upperroman { + list-style: upper-roman } + +p.attribution { + text-align: right ; + margin-left: 50% } + +p.caption { + font-style: italic } + +p.credits { + font-style: italic ; + font-size: smaller } + +p.label { + white-space: nowrap } + +p.rubric { + font-weight: bold ; + font-size: larger ; + color: maroon ; + text-align: center } + +p.sidebar-title { + font-family: sans-serif ; + font-weight: bold ; + font-size: larger } + +p.sidebar-subtitle { + font-family: sans-serif ; + font-weight: bold } + +p.topic-title { + font-weight: bold } + +pre.address { + margin-bottom: 0 ; + margin-top: 0 ; + font-family: serif ; + font-size: 100% } + +pre.line-block { + font-family: serif ; + font-size: 100% } + +pre.literal-block, pre.doctest-block { + margin-left: 2em ; + margin-right: 2em ; + background-color: #eeeeee } + +span.classifier { + font-family: sans-serif ; + font-style: oblique } + +span.classifier-delimiter { + font-family: sans-serif ; + font-weight: bold } + +span.interpreted { + font-family: sans-serif } + +span.option { + white-space: nowrap } + +span.option-argument { + font-style: italic } + +span.pre { + white-space: pre } + +span.problematic { + color: red } + +table { + margin-top: 0.5em ; + margin-bottom: 0.5em } + +table.citation { + border-left: solid thin gray ; + padding-left: 0.5ex } + +table.docinfo { + margin: 2em 4em } + +table.footnote { + border-left: solid thin black ; + padding-left: 0.5ex } + +td, th { + padding-left: 0.5em ; + padding-right: 0.5em ; + vertical-align: top } + +th.docinfo-name, th.field-name { + font-weight: bold ; + text-align: left ; + white-space: nowrap } + +h1 tt, h2 tt, h3 tt, h4 tt, h5 tt, h6 tt { + font-size: 100% } + +tt { + background-color: #eeeeee } + +ul.auto-toc { + list-style-type: none } diff --git a/doc/en/navigation.css b/doc/en/navigation.css new file mode 100644 index 00000000..4b97f3b8 --- /dev/null +++ b/doc/en/navigation.css @@ -0,0 +1,40 @@ +.navigation { + background: transparent; +} + +.navrow { + border-collapse: collapse; + border-bottom-width: 1px; + border-bottom-style: dotted; + border-bottom-color: #f86821; + white-space: nowrap; +} + +.navrow a { + color: #222222; + background: transparent; + border-left-width: 10px; + border-left-style: solid; + border-left-color: #f86821; + font-weight: normal; + margin-right: 1em; + padding: 0em 0.5em; + text-decoration: none; +} + +.navrow a:hover { + color: black; + background: #f8c218; + border-color: #f85b0d; +} + +.navrow span { + color: #222222; + background: #f8c218; + border-left-width: 10px; + border-left-style: solid; + border-left-color: #f85b0d; + font-weight: normal; + margin-right: 1em; + padding: 0em 0.5em; +} diff --git a/doc/en/other.html b/doc/en/other.html new file mode 100644 index 00000000..002b5584 --- /dev/null +++ b/doc/en/other.html @@ -0,0 +1,56 @@ + + + + + + +Other link checkers + + + + + + + + + + + + + + + + diff --git a/doc/en/other.nav b/doc/en/other.nav new file mode 100644 index 00000000..32a5bd75 --- /dev/null +++ b/doc/en/other.nav @@ -0,0 +1,5 @@ +# generated by htmlnav.py, do not edit +name = u'Other' +level = 0 +visible = True +order = 5 diff --git a/doc/en/other.txt b/doc/en/other.txt new file mode 100644 index 00000000..4aedd69a --- /dev/null +++ b/doc/en/other.txt @@ -0,0 +1,58 @@ +.. meta:: + :navigation.order: 5 + :navigation.name: Other + +Other link checkers +=================== + +If LinkChecker does not fit your requirements, you can check out the +competition. All of these programs have also an `Open Source license`_ +like LinkChecker. + +.. _`Open Source license`: + http://www.opensource.org/licenses/ + +- `checkbot`_ written in Perl + + .. _checkbot: + http://degraaff.org/checkbot/ + +- `Checklinks`_ written in Perl + + .. _Checklinks: + http://www.jmarshall.com/tools/cl/ + +- `Dead link check`_ written in Perl + + .. _Dead link check: + http://dlc.sourceforge.net/ + +- `gURLChecker`_ written in C + + .. _gURLChecker: + http://labs.libre-entreprise.org/projects/gurlchecker/ + +- `jchecklinks`_ written in Java + + .. _jchecklinks: + http://web.purplefrog.com/~thoth/jchecklinks/ + +- `link-checker`_ written in C + + .. _link-checker: + http://ymettier.free.fr/link-checker/link-checker.html + +- `linklint`_ written in Perl + + .. _linklint: + http://www.linklint.org/ + +- `webcheck`_ written in Python + + .. _webcheck: + http://www.mired.org/webcheck/ + +- `webgrep`_ written in Perl + + .. _webgrep: + http://cgi.linuxfocus.org/~guido/index.html#webgrep diff --git a/doc/en/shot1.png b/doc/en/shot1.png new file mode 100644 index 00000000..f18dd35f Binary files /dev/null and b/doc/en/shot1.png differ diff --git a/doc/en/shot1_thumb.jpg b/doc/en/shot1_thumb.jpg new file mode 100644 index 00000000..92798392 Binary files /dev/null and b/doc/en/shot1_thumb.jpg differ diff --git a/doc/en/shot2.png b/doc/en/shot2.png new file mode 100644 index 00000000..28559794 Binary files /dev/null and b/doc/en/shot2.png differ diff --git a/doc/en/shot2_thumb.jpg b/doc/en/shot2_thumb.jpg new file mode 100644 index 00000000..32e3715a Binary files /dev/null and b/doc/en/shot2_thumb.jpg differ diff --git a/doc/en/upgrading.html b/doc/en/upgrading.html new file mode 100644 index 00000000..332efe9c --- /dev/null +++ b/doc/en/upgrading.html @@ -0,0 +1,83 @@ + + + + + + +Upgrading + + + + + + + + + + + +
+

Upgrading

+
+

Migrating from 2.2 to 2.3

+

The per-user config file is now ~/.linkchecker/linkcheckerrc +(previous location was ~/.linkcheckerrc ).

+

The default blacklist output file is now ~/.linkchecker/blacklist +(previous location was ~/.blacklist).

+

Python >= 2.4 is now required.

+
+
+

Migrating from 1.x to 2.0

+

The --output and --file-output parameters can specify the encoding +now. You should check your scripts if they support the new option +syntax.

+

Some added checks might trigger new warnings, so automated scripts +or alarms can have more output than with 1.x releases.

+

All output (file and console) is now encoded according to a given +character set encoding which defaults to ISO-8859-15. If you +relied that output was in a specific encoding, you might want to +use the output encoding option.

+
+
+

Migrating from 1.12.x to 1.13.0

+

Since lots of filenames have changed you should check that any +manually installed versions prior to 1.13.0 are removed. Otherwise +you will have startup problems.

+

The default output logger text has now colored output if the +output terminal supports it. The old colored output logger has +been removed.

+

The -F option no longer suppresses normal output. The old behaviour +can be restored by giving the option -onone.

+

The --status option is now the default and has been deprecated. The +old behaviour can be restored by giving the option --no-status.

+

The default recursion depth is now infinite. The old behaviour +can be restored by giving the option --recursion-level=1.

+

The option --strict has been renamed to --extern-strict-all.

+

The commandline program linkchecker returns now non-zero exit value +when errors were encountered. Previous versions always return a zero +exit value. +For scripts to ignore exit values and therefore restore the old behaviour +you can append a || true at the end of the command.

+
+
+ + + + diff --git a/doc/en/upgrading.nav b/doc/en/upgrading.nav new file mode 100644 index 00000000..5e8ad8fd --- /dev/null +++ b/doc/en/upgrading.nav @@ -0,0 +1,5 @@ +# generated by htmlnav.py, do not edit +name = u'Upgrading' +level = 0 +visible = True +order = 2 diff --git a/doc/en/upgrading.txt b/doc/en/upgrading.txt new file mode 100644 index 00000000..68d79b7b --- /dev/null +++ b/doc/en/upgrading.txt @@ -0,0 +1,62 @@ +.. meta:: + :navigation.order: 2 + :navigation.name: Upgrading + +Upgrading +========= + +Migrating from 2.2 to 2.3 +------------------------- + +The per-user config file is now ``~/.linkchecker/linkcheckerrc`` +(previous location was ``~/.linkcheckerrc`` ). + +The default blacklist output file is now ``~/.linkchecker/blacklist`` +(previous location was ``~/.blacklist``). + +Python >= 2.4 is now required. + + +Migrating from 1.x to 2.0 +------------------------- + +The --output and --file-output parameters can specify the encoding +now. You should check your scripts if they support the new option +syntax. + +Some added checks might trigger new warnings, so automated scripts +or alarms can have more output than with 1.x releases. + +All output (file and console) is now encoded according to a given +character set encoding which defaults to ISO-8859-15. If you +relied that output was in a specific encoding, you might want to +use the output encoding option. + + +Migrating from 1.12.x to 1.13.0 +------------------------------- + +Since lots of filenames have changed you should check that any +manually installed versions prior to 1.13.0 are removed. Otherwise +you will have startup problems. + +The default output logger ``text`` has now colored output if the +output terminal supports it. The old ``colored`` output logger has +been removed. + +The ``-F`` option no longer suppresses normal output. The old behaviour +can be restored by giving the option ``-onone``. + +The --status option is now the default and has been deprecated. The +old behaviour can be restored by giving the option ``--no-status``. + +The default recursion depth is now infinite. The old behaviour +can be restored by giving the option ``--recursion-level=1``. + +The option ``--strict`` has been renamed to ``--extern-strict-all``. + +The commandline program ``linkchecker`` returns now non-zero exit value +when errors were encountered. Previous versions always return a zero +exit value. +For scripts to ignore exit values and therefore restore the old behaviour +you can append a ``|| true`` at the end of the command.