2000-02-26 10:24:46 +00:00
|
|
|
|
LinkChecker
|
|
|
|
|
|
=============
|
|
|
|
|
|
|
2000-11-20 22:25:55 +00:00
|
|
|
|
LinkChecker checks HTML documents for broken links.
|
2000-10-10 09:28:27 +00:00
|
|
|
|
|
2000-11-20 22:25:55 +00:00
|
|
|
|
It features
|
2000-10-10 09:28:27 +00:00
|
|
|
|
o recursive checking
|
2000-11-20 22:25:55 +00:00
|
|
|
|
o multithreading
|
|
|
|
|
|
o output in colored or normal text, HTML, SQL, CSV or a sitemap
|
2000-11-16 09:43:01 +00:00
|
|
|
|
graph in GML or XML.
|
2000-10-10 09:28:27 +00:00
|
|
|
|
o HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Gopher, Telnet and local
|
2000-11-20 22:25:55 +00:00
|
|
|
|
file links support
|
|
|
|
|
|
o restriction of link checking with regular expression filters for URLs
|
2000-11-02 10:34:22 +00:00
|
|
|
|
o proxy support
|
2000-11-20 22:25:55 +00:00
|
|
|
|
o username/password authorization for HTTP and FTP
|
2000-10-10 09:28:27 +00:00
|
|
|
|
o robots.txt exclusion protocol support
|
2000-11-02 10:34:22 +00:00
|
|
|
|
o i18n support
|
2000-11-20 22:25:55 +00:00
|
|
|
|
o a command line interface
|
|
|
|
|
|
o a (Fast)CGI web interface (requires HTTP server)
|
2000-10-10 09:28:27 +00:00
|
|
|
|
|
2000-05-22 17:40:42 +00:00
|
|
|
|
|
2000-06-24 15:17:21 +00:00
|
|
|
|
Installing, Requirements, Running
|
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
Read the file INSTALL.
|
|
|
|
|
|
|
|
|
|
|
|
|
2002-09-04 22:47:20 +00:00
|
|
|
|
Running the program
|
|
|
|
|
|
-------------------
|
|
|
|
|
|
o Unix platforms
|
|
|
|
|
|
The local configuration file is $HOME/.linkcheckerrc
|
|
|
|
|
|
Type "linkchecker" followed by your URLs you want to check.
|
|
|
|
|
|
Type "linkchecker -h" for help.
|
|
|
|
|
|
|
|
|
|
|
|
o Windows platforms
|
|
|
|
|
|
Double-click on "linkchecker.bat" on your desktop.
|
|
|
|
|
|
URL input is interactive.
|
|
|
|
|
|
Another way is executing "python.exe linkchecker" in the Python
|
|
|
|
|
|
Scripts directory.
|
|
|
|
|
|
|
|
|
|
|
|
o MacOS 9.x platforms
|
|
|
|
|
|
Read the MacOS Python documentation to find out about passing
|
|
|
|
|
|
commandline options to Python scripts.
|
|
|
|
|
|
|
|
|
|
|
|
|
2000-11-15 22:51:16 +00:00
|
|
|
|
License and Credits
|
|
|
|
|
|
-------------------
|
2000-10-10 09:28:27 +00:00
|
|
|
|
LinkChecker is licensed under the GNU Public License.
|
2002-02-24 12:29:35 +00:00
|
|
|
|
Credits go to Guido van Rossum and his team for making Python.
|
|
|
|
|
|
His hovercraft is full of eels!
|
2000-02-26 10:24:46 +00:00
|
|
|
|
As this program is directly derived from my Java link checker, additional
|
2000-11-09 12:02:38 +00:00
|
|
|
|
credits go to Robert Forsman (the author of JCheckLinks) and his
|
|
|
|
|
|
robots.txt parse algorithm.
|
2000-11-15 22:51:16 +00:00
|
|
|
|
Nicolas Chauvat <Nicolas.Chauvat@logilab.fr> supplied a patch for
|
|
|
|
|
|
an XML output logger.
|
2000-02-26 10:24:46 +00:00
|
|
|
|
I want to thank everybody who gave me feedback, bug reports and
|
|
|
|
|
|
suggestions.
|
|
|
|
|
|
|
2000-05-22 17:40:42 +00:00
|
|
|
|
|
|
|
|
|
|
Versioning
|
|
|
|
|
|
----------
|
2000-03-19 14:24:33 +00:00
|
|
|
|
Version numbers have the same meaning as Linux Kernel version numbers.
|
|
|
|
|
|
The first number is the major package version. The second number is
|
|
|
|
|
|
the minor package version. An odd second number stands for development
|
|
|
|
|
|
versions, an even number for stable version. The third number is a
|
|
|
|
|
|
package release sequence number.
|
|
|
|
|
|
So for example 1.1.5 is the fifth release of the 1.1 development package.
|
|
|
|
|
|
|
2000-05-22 17:40:42 +00:00
|
|
|
|
|
|
|
|
|
|
Included packages
|
|
|
|
|
|
-----------------
|
2000-03-20 20:19:34 +00:00
|
|
|
|
fcgi.py and sz_fcgi.py from http://saarland.sz-sb.de/~ajung/sz_fcgi/
|
2002-02-24 12:29:35 +00:00
|
|
|
|
CSV from http://eh.org/~laurie/comp/python/csv/index.html
|
2000-03-19 14:24:33 +00:00
|
|
|
|
|
2002-02-24 12:29:35 +00:00
|
|
|
|
Note that included packages are modified by me.
|
2000-04-10 16:58:05 +00:00
|
|
|
|
|
|
|
|
|
|
|
2000-05-22 17:40:42 +00:00
|
|
|
|
Internationalization
|
|
|
|
|
|
--------------------
|
2000-06-11 17:40:22 +00:00
|
|
|
|
For german output execute "export LC_MESSAGES=de" in bash or
|
2000-05-22 17:40:42 +00:00
|
|
|
|
"setenv LC_MESSAGES de" in tcsh.
|
|
|
|
|
|
Under Windows, execute "set LC_MESSAGES=de".
|
2002-09-23 15:09:29 +00:00
|
|
|
|
Other supported languages are 'nl' (Nederlands) and 'fr' (fran<61>ais).
|
|
|
|
|
|
If you want to help me translate LinkChecker, copy the linkchecker.pot
|
|
|
|
|
|
file to <your language>.po and send me the translated file.
|
2000-05-22 17:40:42 +00:00
|
|
|
|
|
2000-06-10 18:06:43 +00:00
|
|
|
|
|
2000-05-22 17:40:42 +00:00
|
|
|
|
Code design
|
|
|
|
|
|
-----------
|
|
|
|
|
|
Only if you want to hack on the code.
|
2000-04-10 16:58:05 +00:00
|
|
|
|
|
|
|
|
|
|
(1) Look at the linkchecker script. This thing just reads all the
|
|
|
|
|
|
commandline options and stores them in a Config object.
|
|
|
|
|
|
|
|
|
|
|
|
(2) Which leads us directly to the Config class. This class stores all
|
|
|
|
|
|
options and works a little magic: it tries to find out if your platform
|
2000-10-13 21:58:45 +00:00
|
|
|
|
supports threads. If so, threading is enabled. If not, it is disabled.
|
2002-02-24 12:29:35 +00:00
|
|
|
|
Several functions are replaced with their threaded equivalents if
|
2000-07-10 12:29:08 +00:00
|
|
|
|
threading is enabled.
|
2000-06-18 12:36:42 +00:00
|
|
|
|
Another thing are config files. A Config object reads config file options
|
2000-06-10 18:06:43 +00:00
|
|
|
|
on initialization so they get handled before any commandline options.
|
2000-04-10 16:58:05 +00:00
|
|
|
|
|
|
|
|
|
|
(3) The linkchecker script finally calls linkcheck.checkUrls(), which
|
2000-06-18 12:36:42 +00:00
|
|
|
|
calls linkcheck.Config.checkUrl(), which calls linkcheck.UrlData.check().
|
2000-04-10 16:58:05 +00:00
|
|
|
|
An UrlData object represents a single URL with all attached data like
|
2002-02-24 12:29:35 +00:00
|
|
|
|
validity, check time and so on. These values are filled by the
|
2000-05-22 10:29:14 +00:00
|
|
|
|
UrlData.check() function.
|
2000-04-24 22:07:48 +00:00
|
|
|
|
Derived from the base class UrlData are the different URL types:
|
2000-12-22 16:25:32 +00:00
|
|
|
|
HttpUrlData for http:// links, MailtoUrlData for mailto: links, etc.
|
2000-04-24 22:07:48 +00:00
|
|
|
|
|
2000-07-10 12:29:08 +00:00
|
|
|
|
UrlData defines the functions which are common for *all* URLs, and
|
2000-04-10 16:58:05 +00:00
|
|
|
|
the subclasses define functions needed for their URL type.
|
|
|
|
|
|
|
2000-04-24 22:07:48 +00:00
|
|
|
|
(4) Lets look at the output. Every output is defined in a Logger class.
|
|
|
|
|
|
Each logger has functions init(), newUrl() and endOfOutput().
|
2000-06-18 12:36:42 +00:00
|
|
|
|
We call init() once to initialize the Logger. UrlData.check() calls
|
2002-02-24 12:29:35 +00:00
|
|
|
|
newUrl() (through UrlData.logMe()) for each new URL and after all
|
2000-12-22 16:25:32 +00:00
|
|
|
|
checking is done we call endOfOutput(). Easy.
|
2000-11-20 22:25:55 +00:00
|
|
|
|
New loggers are created with the Config.newLogger function.
|