check links in web documents or full websites
Find a file
2004-07-03 06:17:39 +00:00
debian updated 2004-05-27 07:08:14 +00:00
lconline defer i18n of log field names 2003-01-05 20:35:41 +00:00
linkcheck renamed 2004-07-03 06:17:08 +00:00
po updated 2004-05-27 21:06:01 +00:00
test set httplib debug 2004-05-27 20:29:07 +00:00
.cvsignore bump up version, use python2.2 2002-10-13 21:56:36 +00:00
ChangeLog updated 2004-05-27 20:54:23 +00:00
create.sql rename column to col 2003-08-05 13:00:37 +00:00
FAQ updated, removed stuff about HEAD request 2004-01-30 10:21:05 +00:00
INSTALL link updated 2004-05-27 00:13:25 +00:00
lc.cgi coding line 2003-08-11 12:56:37 +00:00
lc.fcgi coding line 2003-08-11 12:56:37 +00:00
lc.sz_fcgi copyright 2004-04-05 09:55:14 +00:00
LICENSE Initial revision 2000-02-26 10:24:46 +00:00
linkcheck-cron.sh wording 2004-01-30 10:16:53 +00:00
linkchecker updated examples 2004-03-09 21:37:09 +00:00
linkchecker-completion added 2004-01-29 14:10:35 +00:00
linkchecker.1 put examples in the man page before options, and update them 2004-03-09 21:36:58 +00:00
linkchecker.bat copyright 2004-04-05 09:55:14 +00:00
linkchecker.spec copyright 2004-04-05 09:55:14 +00:00
linkcheckerrc new --status option 2004-01-03 13:27:47 +00:00
Makefile release check 2004-05-27 08:00:02 +00:00
MANIFEST.in remove draft 2004-04-04 17:05:57 +00:00
pycheckrc pychecker 2004-04-03 18:10:38 +00:00
README add Mac OS X as supported platform 2004-01-28 18:44:10 +00:00
setup.cfg remove draft 2004-04-04 17:05:57 +00:00
setup.py renamed linkcheck.log to linkcheck.logger 2004-07-03 06:17:39 +00:00
TODO updated 2004-05-27 21:20:00 +00:00
WONTDO updated 2003-07-04 14:27:44 +00:00

                      LinkChecker
                     =============

LinkChecker checks HTML documents for broken links.

It features
o recursive checking
o multithreading
o output in colored or normal text, HTML, SQL, CSV or a sitemap
  graph in GML or XML.
o HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Gopher, Telnet and local
  file links support
o restriction of link checking with regular expression filters for URLs
o proxy support
o username/password authorization for HTTP and FTP
o robots.txt exclusion protocol support
o i18n support
o a command line interface
o a (Fast)CGI web interface (requires HTTP server)


Installing and Requirements
---------------------------
Read the file INSTALL.


Running the program
-------------------
o Unix or Mac OS X platforms
  The local configuration file is $HOME/.linkcheckerrc
  Type "linkchecker" followed by your URLs you want to check.
  Type "linkchecker -h" for help.

o Windows platforms
  Double-click on "linkchecker.bat" on your desktop.
  URL input is interactive.
  Another way is executing "python.exe linkchecker" in the Python
  Scripts directory.

o Mac OS 9.x platforms
  Read the MacOS Python documentation to find out about passing
  commandline options to Python scripts.


License and Credits
-------------------
LinkChecker is licensed under the GNU Public License.
Credits go to Guido van Rossum and his team for making Python.
His hovercraft is full of eels!
As this program is directly derived from my Java link checker, additional
credits go to Robert Forsman (the author of JCheckLinks) and his
robots.txt parse algorithm.
Nicolas Chauvat <Nicolas.Chauvat@logilab.fr> supplied a patch for
an XML output logger.
I want to thank everybody who gave me feedback, bug reports and
suggestions.


Versioning
----------
Version numbers have the same meaning as Linux Kernel version numbers.
The first number is the major package version. The second number is
the minor package version. An odd second number stands for development
versions, an even number for stable version. The third number is a
package release sequence number.
So for example 1.1.5 is the fifth release of the 1.1 development package.


Included packages
-----------------
fcgi.py and sz_fcgi.py from Andreas Jung (http://www.andreas-jung.com/privat.html)
Note that included packages are modified by me.


Internationalization
--------------------
For german output execute "export LC_MESSAGES=de" in bash or
"setenv LC_MESSAGES de" in tcsh.
Under Windows, execute "set LC_MESSAGES=de".
Other supported languages are 'nl' (Nederlands) and 'fr' (fran<61>ais).
If you want to help me translate LinkChecker, copy the linkchecker.pot
file to <your language>.po and send me the translated file.


Code design
-----------
Only if you want to hack on the code.

(1) Look at the linkchecker script. This thing just reads all the
commandline options and stores them in a Config object.

(2) Which leads us directly to the Config class. This class stores all
options and supports threading and reading config files.
A Config object reads config file options on initialization so they get
handled before any commandline options.

(3) The linkchecker script calls linkcheck.checkUrls(), which
calls linkcheck.Config.checkUrl(), which calls linkcheck.UrlData.check().
An UrlData object represents a single URL with all attached data like
validity, check time and so on. These values are filled by the
UrlData.check() function.
Derived from the base class UrlData are the different URL types: 
HttpUrlData for http:// links, MailtoUrlData for mailto: links, etc.

UrlData defines the functions which are common for *all* URLs, and
the subclasses define functions needed for their URL type.

(4) Lets look at the output. Every output is defined in a Logger class.
Each logger has functions init(), newUrl() and endOfOutput().
We call init() once to initialize the Logger. UrlData.check() calls
newUrl() (through UrlData.logMe()) for each new URL and after all
checking is done we call endOfOutput(). Easy.
New loggers are created with the Config.newLogger function.