diff --git a/INSTALL b/INSTALL index 0890b249..0b4f1496 100644 --- a/INSTALL +++ b/INSTALL @@ -3,10 +3,25 @@ Requirements ------------ -Python >= 2.0 from http://www.python.org/ +Python >= 1.5.2 from http://www.python.org/ +Distutils >= 0.9.1 from http://www.python.org/sigs/distutils-sig/ +Python 1.6 includes the Distutils 0.9.1, +Python 2.0 includes the Distutils 1.0.1 + + +Optionally packages +------------------- +OpenSSL from http://www.openssl.org/ +You will need Perl for Win32 (available from +http://www.activestate.com/ActivePerl) if you want to install OpenSSL +on Windows! + Setup ----- +Run "python setup.py config" to configure. +Linux users should run "python setup.py config -lcrypto" to use the SSL +module. Run "python setup.py install" to install. Run "python setup.py --help" for help. Debian users can build the .deb package with "debian/rules binary" as @@ -21,9 +36,26 @@ to check. Type "linkchecker -h" for help. +Note +---- +If you want to make your own distribution with "python setup.py sdist", +you will need Distutils >= 0.9.4. Older versions are hanging when +they try to parse the MANIFEST.in file. + + (Fast)CGI web interface ----------------------- -The *cgi files are three CGI script which you can use to run LinkChecker +The *cgi files are three CGI scripts which you can use to run LinkChecker with a nice graphical web interface. You can use and adjust the example HTML files in the lconline directory to run the script. +1) Choose a CGI script. The simplest is lc.cgi and you need a web server + with CGI support. + The scripts lc.fcgi (I tested this a while ago) and lc.sz_fcgi + (untested) need a web server with FastCGI support. +2) Copy the script of your choice in the CGI directory. +3) Adjust the "action=..." parameter in lconline/lc_cgi.html + to point to your CGI script. +4) load the lconline/index.html file, enter an URL and klick on the + check button + If something goes wrong, check the error log of your web server. diff --git a/MANIFEST.in b/MANIFEST.in index 9c6ba2aa..1f3eef0f 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -5,10 +5,11 @@ include lc.cgi lc.fcgi lc.sz_fcgi include Makefile include create.sql include debian/rules debian/changelog debian/copyright debian/control -include debian/dirs debian/docs debian/links +include debian/dirs debian/docs debian/links debian/postinst +include debian/prerm include DNS/README include test/viewprof.py test/profiletest.py test/*.html -recursive-include locale *.po +recursive-include locale *.mo recursive-include po * recursive-include lconline * recursive-include tests *.py diff --git a/Makefile b/Makefile index e3320a2e..c53e85f6 100644 --- a/Makefile +++ b/Makefile @@ -1,55 +1,81 @@ # This Makefile is only used by developers. # You will need a Debian Linux system to use this Makefile! VERSION=$(shell python setup.py --version) -PACKAGE=linkchecker -NAME=$(shell python setup.py --name) -HOST=treasure.calvinsplayground.de -LCOPTS=-ocolored -Ftext -Fhtml -Fgml -Fsql -Fcsv -R -t0 -v -itreasure.calvinsplayground.de -s -DEBPACKAGE=$(PACKAGE)_$(VERSION)_i386.deb +PACKAGE = linkchecker +NAME = $(shell python setup.py --name) +HOST=fsinfo.cs.uni-sb.de +LCOPTS=-ocolored -Ftext -Fhtml -Fgml -Fsql -Fcsv -R -t0 -v +DEBPACKAGE = $(PACKAGE)_$(VERSION)_i386.deb +SOURCES = \ +linkcheck/Config.py \ +linkcheck/FileUrlData.py \ +linkcheck/FtpUrlData.py \ +linkcheck/GopherUrlData.py \ +linkcheck/HostCheckingUrlData.py \ +linkcheck/HttpUrlData.py \ +linkcheck/HttpsUrlData.py \ +linkcheck/JavascriptUrlData.py \ +linkcheck/Logging.py \ +linkcheck/MailtoUrlData.py \ +linkcheck/NntpUrlData.py \ +linkcheck/TelnetUrlData.py \ +linkcheck/Threader.py \ +linkcheck/UrlData.py \ +linkcheck/__init__.py \ +linkcheck/lc_cgi.py \ +linkchecker DESTDIR=/. -.PHONY: test clean files upload dist install all +.PHONY: test clean distclean package files upload dist locale all all: @echo "Read the file INSTALL to see how to build and install" clean: - fakeroot debian/rules clean - rm -f .time.po + -python setup.py clean --all + $(MAKE) -C po clean -distclean: clean +distclean: clean cleandeb rm -rf dist - rm -f $(PACKAGE)-out.* VERSION + rm -f $(PACKAGE)-out.* VERSION LinkCheckerConf.py* MANIFEST -.time.po: - $(MAKE) -C po - touch .time.po +cleandeb: + rm -rf debian/$(PACKAGE) debian/tmp + rm -f debian/*.debhelper debian/{files,substvars} + rm -f configure-stamp build-stamp -dist: .time.po - rm -rf debian/tmp - python setup.py sdist --formats=gztar,zip bdist_rpm bdist_wininst +dist: locale fakeroot debian/rules binary + # cleandeb because distutils choke on dangling symlinks + # (linkchecker.1 -> undocumented.1) + $(MAKE) cleandeb + python setup.py sdist --formats=gztar,zip bdist_rpm + # extra run without SSL compilation + python setup.py bdist_wininst mv -f ../$(DEBPACKAGE) dist package: cd dist && dpkg-scanpackages . ../override.txt | gzip --best > Packages.gz -files: .time.po +files: locale ./$(PACKAGE) $(LCOPTS) -i$(HOST) http://$(HOST)/~calvin/ VERSION: echo $(VERSION) > VERSION -upload: dist package files VERSION +upload: distclean dist package files VERSION scp debian/changelog shell1.sourceforge.net:/home/groups/$(PACKAGE)/htdocs/changes.txt scp linkchecker-out.* shell1.sourceforge.net:/home/groups/$(PACKAGE)/htdocs scp VERSION shell1.sourceforge.net:/home/groups/$(PACKAGE)/htdocs/raw/ scp dist/* shell1.sourceforge.net:/home/groups/ftp/pub/$(PACKAGE)/ ssh -C -t shell1.sourceforge.net "cd /home/groups/$(PACKAGE) && make" -test: .time.po +test: rm -f test/*.result @for i in test/*.html; do \ echo "Testing $$i. Results are in $$i.result"; \ ./$(PACKAGE) -r1 -o text -N"news.rz.uni-sb.de" -v -a $$i > $$i.result 2>&1; \ done + +locale: + $(MAKE) -C po diff --git a/README b/README index e704fe11..6f1b1843 100644 --- a/README +++ b/README @@ -11,13 +11,14 @@ o output can be colored or normal text, HTML, SQL, CSV or a GML sitemap graph o HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Gopher, Telnet and local file links are supported. + Javascript links are currently ignored o restrict link checking with regular expression filters for URLs o proxy support o give username/password for HTTP and FTP authorization o robots.txt exclusion protocol support o i18n support o command line interface -o (Fast)CGI web interface +o (Fast)CGI web interface (requires HTTP server) Installing, Requirements, Running @@ -31,7 +32,8 @@ LinkChecker is licensed under the GNU Public License. Credits go to Guido van Rossum for making Python. His hovercraft is full of eels! As this program is directly derived from my Java link checker, additional -credits go to Robert Forsman (the author of JCheckLinks). +credits go to Robert Forsman (the author of JCheckLinks) and his +robots.txt parse algorithm. I want to thank everybody who gave me feedback, bug reports and suggestions. @@ -48,10 +50,14 @@ So for example 1.1.5 is the fifth release of the 1.1 development package. Included packages ----------------- +httplib from http://www.lyra.org/greg/python/ +httpslib from http://home.att.net/~nvsoft1/ssl_wrapper.html DNS see DNS/README fcgi.py and sz_fcgi.py from http://saarland.sz-sb.de/~ajung/sz_fcgi/ +fintl.py from http://sourceforge.net/snippet/detail.php?type=snippet&id=100059 Note that the following packages are modified by me: +httplib.py (renamed to http11lib.py and a bug fixed) fcgi.py (implemented streamed output) sz_fcgi.py (simplified the code) DNS/Lib.py:566 fixed rdlength name error diff --git a/TODO b/TODO index 7d460a51..8eb1b90d 100644 --- a/TODO +++ b/TODO @@ -1,11 +1,6 @@ High priority -o Proxy geht nicht: - - getrennter http/https/ftp proxy - - environment Variablen werden bei RobotParser benutzt, also muß ich - das auch machen. - -o Robot parser testen +o Use Python 2.0 features o I want to be able to supply a "break" command even when multiple threads are running. diff --git a/debian/changelog b/debian/changelog index ca3d0d6b..239071e2 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,13 +1,30 @@ -linkchecker (1.3.0) unstable; urgency=low +linkchecker (1.2.8) unstable; urgency=low - * require Python 2.0 so we can get rid of the robots.txt parser - and use the one provided within the Python library - * added