check links in web documents or full websites
Find a file
2000-07-10 12:29:08 +00:00
debian CGI 2000-07-10 12:29:08 +00:00
DNS resolv.conf and empty lines fixed 2000-06-23 21:33:02 +00:00
GML Added some grammar files 2000-02-28 13:49:21 +00:00
lconline CGI 2000-07-10 12:29:08 +00:00
linkcheck french translation 2000-06-26 22:50:08 +00:00
locale french translation 2000-06-26 22:50:08 +00:00
PyLR cleaned files 2000-02-28 20:42:54 +00:00
test profiling 2000-06-21 01:27:37 +00:00
tests glitches 2000-06-18 12:36:42 +00:00
.cvsignore .prof ignore 2000-06-13 23:05:47 +00:00
create.sql Initial revision 2000-02-26 10:24:46 +00:00
CSV.py CSV output 2000-04-28 11:17:58 +00:00
fcgi.py See Changelog 2000-03-25 01:11:00 +00:00
fintl.py i18n support 2000-05-01 11:35:08 +00:00
http11lib.py i18n support 2000-05-01 11:35:08 +00:00
httpslib.py HTTPS fixes 2000-02-29 12:59:27 +00:00
INSTALL CGI 2000-07-10 12:29:08 +00:00
install.py glitches 2000-06-18 12:36:42 +00:00
lc.cgi exception and log handling 2000-06-10 18:06:43 +00:00
lc.fcgi exception and log handling 2000-06-10 18:06:43 +00:00
lc.sz_fcgi exception and log handling 2000-06-10 18:06:43 +00:00
LICENSE Initial revision 2000-02-26 10:24:46 +00:00
linkchecker.bat glitches 2000-06-18 12:36:42 +00:00
linkchecker.bat.tmpl glitches 2000-06-18 12:36:42 +00:00
linkchecker.tmpl bla 2000-06-24 15:17:21 +00:00
linkcheckerrc i18n additions 2000-06-03 12:50:19 +00:00
Makefile CGI 2000-07-10 12:29:08 +00:00
MANIFEST.in bla 2000-06-24 15:17:21 +00:00
override.txt .prof ignore 2000-06-13 23:05:47 +00:00
README CGI 2000-07-10 12:29:08 +00:00
setup.py resolv.conf and empty lines fixed 2000-06-23 21:33:02 +00:00
ssl.c HTTPS support 2000-02-29 12:53:00 +00:00
StringUtil.py i18n support 2000-05-01 11:35:08 +00:00
sz_fcgi.py See ChangeLog 2000-03-26 18:53:23 +00:00
Template.py i18n support 2000-05-01 11:35:08 +00:00
TODO french translation 2000-06-26 22:50:08 +00:00
util1.py bla 2000-06-24 15:17:21 +00:00

                      LinkChecker
                     =============

With LinkChecker you can check your HTML documents for broken links.

Features
--------
o recursive checking
o multithreaded
o output can be colored or normal text, HTML, SQL, CSV or a GML sitemap
  graph
o HTTP/1.1, HTTPS, FTP, mailto:, news:, Gopher, Telnet and local file links
  are supported
  Javascript links are currently ignored
o restrict link checking with regular expression filters for URLs
o HTTP proxy support
o give username/password for HTTP and FTP authorization
o robots.txt exclusion protocol support 
o internationalization support
o (Fast)CGI web interface


Installing, Requirements, Running
---------------------------------
Read the file INSTALL.


License
--------
LinkChecker is licensed under the GNU Public License.
Credits go to Guido van Rossum for making Python. His hovercraft is
full of eels!
As this program is directly derived from my Java link checker, additional
credits go to Robert Forsman (the author of JCheckLinks) and his
robots.txt parse algorithm.
I want to thank everybody who gave me feedback, bug reports and
suggestions.


Versioning
----------
Version numbers have the same meaning as Linux Kernel version numbers.
The first number is the major package version. The second number is
the minor package version. An odd second number stands for development
versions, an even number for stable version. The third number is a
package release sequence number.
So for example 1.1.5 is the fifth release of the 1.1 development package.


Included packages
-----------------
httplib from http://www.lyra.org/greg/python/
httpslib from http://home.att.net/~nvsoft1/ssl_wrapper.html
DNS see DNS/README
fcgi.py and sz_fcgi.py from http://saarland.sz-sb.de/~ajung/sz_fcgi/
fintl.py from http://sourceforge.net/snippet/detail.php?type=snippet&id=100059

Note that the following packages are modified by me:
httplib.py (renamed to http11lib.py and a bug fixed)
fcgi.py (implemented streamed output)
sz_fcgi.py (simplified the code)
DNS/Lib.py:566 fixed rdlength name error
DNS/Base.py: fixed /etc/resolv.conf parser to cope with empty lines


Internationalization
--------------------
For german output execute "export LC_MESSAGES=de" in bash or
"setenv LC_MESSAGES de" in tcsh.
Under Windows, execute "set LC_MESSAGES=de".
For french output use 'fr' instead of 'de'.


Code design
-----------
Only if you want to hack on the code.

(1) Look at the linkchecker script. This thing just reads all the
commandline options and stores them in a Config object.

(2) Which leads us directly to the Config class. This class stores all
options and works a little magic: it tries to find out if your platform
supports threads. If so, they are enabled. If not, they are disabled.
Several functions are replaced with their threaded equivalents if 
threading is enabled.
Another thing are config files. A Config object reads config file options
on initialization so they get handled before any commandline options.

(3) The linkchecker script finally calls linkcheck.checkUrls(), which
calls linkcheck.Config.checkUrl(), which calls linkcheck.UrlData.check().
An UrlData object represents a single URL with all attached data like
validity, check time and so on. These values are filled by the 
UrlData.check() function.
Derived from the base class UrlData are the different URL types: 
HttpUrlData for http:// links, MailtoUrlData for mailto: links and so on.

UrlData defines the functions which are common for *all* URLs, and
the subclasses define functions needed for their URL type.

(4) Lets look at the output. Every output is defined in a Logger class.
Each logger has functions init(), newUrl() and endOfOutput().
We call init() once to initialize the Logger. UrlData.check() calls
newUrl() (through UrlData.logMe()) and after all checking we call
endOfOutput(). Easy.
New loggers are created with the Config.newLogger(name, fileoutput) function.


Nifty features you did not expect
---------------------------------
o Included brain enhancer. Just read Python code to gain intelligence.
o Wash-O-matic. LinkChecker has a secret option which washes all your 
  dirty clothes in a matter of seconds.
o Y2K-Compatibility(tm) guarantee. The fact that you can read this text
  in the Millenium age is proof enough!
o Self destruction option (also called kamikaze option). Punch your fists
  several times on your keyboard. Banzaaaiiii!
o There is no spoon. Wake up already!