linkchecker/ChangeLog

1.12.3 "The Princess Bride" (released 27.5.2004)
  * fall back to GET on bad status line of a HEAD request
    Type: bugfix
    Changed: linkcheck/HttpUrlData.py

  * really fall back to GET with Zope servers; fixes infinite loop
    Type: bugfix
    Changed: linkcheck/HttpUrlData.py

  * better error msg on BadStatusLine error
    Type: feature
    Changed: linkcheck/UrlData.py

  * updated optcomplete to newest upstream
    Type: feature
    Changed: linkcheck/optcomplete.py

  * also quote query parts of urls
    Type: bugfix
    Changed: linkcheck/{HttpUrlData, url}.py

  * - preserve the order in which HTML attributes have been parsed
    - cope with trailing space in HTML comments
    Type: feature
    Changed: linkcheck/parser/{__init__.py,htmllex.l}
    Added: linkcheck/containers.py

  * rework anchor fallback
    Type: bugfix
    Changed: linkcheck/HttpUrlData.py

  * move contentAllowsRobot check to end of recursion check to avoid
    unnecessary GET request
    Type: bugfix
    Changed: linkcheck/UrlData.py

1.12.2 (release 4.4.2004)
  * use XmlUtils instead of xmlify for XML quoting
    Type: code cleanup
    Added: linkcheck/XmlUtils.py
    Changed: linkcheck/StringUtil.py, linkcheck/log/XMLLogger.py

  * don't require a value anymore with the --version option
    Type: bugfix
    Changed: linkchecker

  * before putting url data objects in the queue, check if they have
    correct syntax and are not already cached
    Type: optimization
    Changed: linkcheck/{UrlData,Config}.py

  * every once in a while, remove all already cached urls from the
    incoming queue. This action is reported when --status is given.
    Type: optimization
    Changed: linkcheck/Config.py

  * both changes above result in significant performance improvements
    when checking large websites, since a majority of the links tend
    to be navigation links to already-cached pages.
    Type: note

  * updated examples and put them before options in the man page for
    easier reading
    Type: documentation
    Changed: linkchecker, linkchecker.1

  * added contact url and email to the HTTP User-Agent string, which
    gets us more accepted by some bot-blocking software; also see
    http://www.livejournal.com/bots/
    Type: feature
    Changed: linkcheck/Config.py

  * only check robots.txt for http connections
    Type: bugfix
    Changed: linkcheck/{Http,}UrlData.py
    Closes: SF bug 928895

  * updated regression tests
    Type: feature
    Changed: test/test_*.py, Makefile
    Added: test/run.sh

  * preserve the order in which HTML attributes have been parsed
    Type: feature
    Changed: linkcheck/parser/{__init__.py,htmllex.l}

  * handle and correct missing start quotes in HTML attributes
    Type: feature
    Changed: linkcheck/parser/htmllex.l

  * full parsing of .css files
    Type: feature
    Changed: linkcheck/{Http,}UrlData.py, linkcheck/linkparse.py

  * removed Gilman news draft
    Type: feature
    Removed: draft-gilman-news-url-00.txt


1.12.1 (release 21.2.2004)
  * raise IncompleteRead instead of ValueError on malformed chunked
    HTTP data
    Changed: linkcheck/httplib2.py
  * catch errors earlier in recursion check
    Changed: linkcheck/UrlData.py
  * quote url and parent url in log output
    Changed: linkcheck/log/*.py
    Added: linkcheck/url.py

1.12.0 (release 31.1.2004)
  * added LRU.setdefault function
    Changed: linkcheck/LRU.py
    Closes: SF bug 885916
  * Added Mac OS X as supported platform (version 10.3 is known to work)
    Changed: README, INSTALL
  * HTML parser objects are now subclassable and collectable by the cyclic
    garbage collector
    Changed: linkcheck/parser/htmlparse.y
  * made some minor parser fixes for attribute scanning and JavaScript
    Changed: linkcheck/parser/htmllex.l
  * include the optcomplete module for bash autocompletion
    Added: linkcheck/optcomplete.py, linkcheck-completion
    Changed: MANIFEST.in, setup.py
  * print out nicer error message for unknown host names
    Changed: linkcheck/UrlData.py
  * added new logger type "none" printing out nothing which is handy for
    cron scripts.
    Changed: linkchecker, linkcheck/Config.py, linkcheck/log/__init__.py
    Added: linkcheck/log/NoneLogger.py
  * the -F file output option disables console output now
    Changed: linkchecker
  * added an example cron script
    Added: linkcheck-cron.sh
    Changed: MANIFEST.in, setup.py
  * only warn about missing anchor support servers when the url has
    actually an anchor
    Changed: linkcheck/HttpUrlData.py
  * always fall back to HTTP GET request when HEAD gave an error to
    cope with servers not supporting HEAD requests
    Changed: linkcheck/HttpUrlData.py, FAQ

1.10.3 (release 10.1.2004)
  * use the optparser module for command line parsing
    Changed: linkchecker, po/*.po
  * use Set() instead of hashmap
    Changed: linkcheck/Config.py
  * fix mime-type checking to allow parsing of .css stylesheets
    Changed: linkcheck/HttpUrlData.py
  * honor HTML meta tags for robots, ie.
    <meta name="ROBOTS" content="NOFOLLOW">
    Changed: linkcheck/UrlData.py, linkcheck/linkparse.py
  * much less aggressive thread acquiring, this fixes the 100% CPU
    usage from the previous version
    Changed: linkcheck/Threader.py

1.10.2 (release 3.1.2004)
  * fixed CGI safe_url pattern, it was too strict
    Changed: linkcheck/lc_cgi.py
  * replace backticks with repr() or %r
    Changed: all .py files containing backticks, and po/*.po
  * make windows DNS nameserver parsing more robust
    Changed: linkcheck/DNS/Base.py
    Closes: SF bugs 863227,864383
  * only cache used data, not the whole url object
    Changed: linkcheck/{Http,}UrlData.py
  * limit cached data
    Changed: linkcheck/{UrlData,Config}.py
    Added: linkcheck/LRU.py
    Closes: SF bug 864516
  * use dummy_threading module and get rid of the _NoThreads
    functions
    Changed: linkchecker, linkcheck/{Config,Threader}.py,
      test/test_*.py
  * set default connection timeout to 60 seconds
    Changed: linkcheck/__init__.py
  * new option --status print regular messages about number of
    checked urls and urls still to check
    Changed: linkchecker, linkcheck/{__init__,Config}.py

1.10.1 (release 19.12.2003)
  * added Mandrake .spec file from Chris Green <cmg@dok.org>
    Added: linkchecker.spec
    Changed: MANIFEST.in
  * print last-modified date for http and https links in infos
    Changed: linkcheck/HttpUrlData.py
  * add detailed installation instructions for Windows
    Changed: INSTALL
    Closes: SF bug 857748
  * updated the DNS nameserver config parse routines
    Changed: linkcheck/DNS/Base.py
    Added: linkcheck/DNS/winreg.py
    Removed: linkcheck/DNS/win32dns.py
  * fix https support test
    Changed: linkcheck/HttpUrlData.py

1.10.0 (released 7.12.2003)
  * catch httplib errors in robotparser
    Changed: linkcheck/robotparser2.py
    Closes: SF bug 836864
  * - infinite recursion option with negative value works now
    - initialize self.urlparts to avoid crash when reading cached http
      urls
    - with --strict option do not add any automatic filters if the user
      gave his own on the command line
    Changed: linkcheck/UrlData.py

1.9.5 (released 31.10.2003)
  * Add Zope to servers with broken HEAD support, adjusted the FAQ
    Changed: linkcheck/HttpUrlData.py, FAQ
    Closes: SF bug 833419
  * Disable psyco usage, it is causing infinite loops (this is a known
    issue with psyco); and it is disabling ctrl-c interrupts (this
    is also a known issue in psyco)
    Changed: linkchecker
  * use internal debug logger
    Changed: linkcheck/robotparser2.py
  * do not hardcode Accept-Encoding header in HTTP request
    Added: linkcheck/httplib2.py
    Changed: linkcheck/robotparser2.py

1.9.4 (released 22.10.2003)
  * parse CSS stylesheet files and check included urls, for example
    background images
    Changed: linkcheck/{File,Http,Ftp,}UrlData.py, linkcheck/linkparser.py
  * try to use psyco for the commandline linkchecker script
    Changed: linkchecker
  * when decompression of compressed HTML pages fails, assume the page
    is not compressed
    Changed: linkcheck/{robotparser2,HttpUrlData}.py

1.9.3 (released 16.10.2003)
  * re-added an updated robot parser which uses urllib2 and can decode
    compressed transfer encodings.
    Added: linkcheck/robotparser2.py
  * more restrictive url validity checking when running in CGI mode
    Changed: linkcheck/lc_cgi.py
  * accept more Windows path specifications, like
    file://C:\Dokume~1\test.html
    Changed: linkcheck/FileUrlData.py

1.9.2
  * parser fixes:
    - do not #include <stdint.h>, fixes build on some FreeBSD, Windows
      and Solaris/SunOS platforms
    - ignore first leading invalid backslash in a=\"b\" attributes
    Changed: linkcheck/parser/htmllex.{l,c}
  * add full script path to linkchecker on windows systems
    Changed: linkchecker.bat
  * fix generation of Linkchecker_Readme.txt under windows systems
    Changed: setup.py

1.9.1
  * add documentation how to change the default C compiler
    Changed: INSTALL
  * fixed blacklist logging
    Changed: linkcheck/log/BlacklistLogger.py
  * removed unused imports
    Changed: linkcheck/*.py
  * parser fixes:
    - fixed parsing of end tags with trailing garbage
    - fixed parsing of script single comment lines
    Changed: linkcheck/parser/htmllex.l

1.9.0
  * Require Python 2.3
    - removed timeoutsocket.py and robotparser.py, using upstream
    - use True/False for boolean values
    - use csv module
    - use new-style classes
    Closes: SF bug 784977
    Changed: a lot
  * update po makefiles and tools
    Changed po/*
  * start CGI output immediately
    Changed: lc.cgi, lc.fcgi, lc.sz_fcgi, linkcheck/lc_cgi.py
    Closes: SF bug 784331

1.8.22
  * allow colons in HTML attribute names, used for namespaces
    Changed: linkcheck/parser/htmllex.l
  * fix match of intern patterns with --denyallow enabled
    Changed: linkcheck/UrlData.py
  * s/intern/internal/ and s/extern/external/ in the documentation
    Changed: linkchecker, linkchecker.1, FAQ
  * rename column "column" to "col" in SQL output, since "column" is
    a reserved keyword. Thanks Garvin Hicking for the hint.
    Changed: linkcheck/log/SQLLogger.py, create.sql
  * handle HTTP redirects to a non-http url
    Changed: linkcheck/{Http,}UrlData.py
    Closes: SF bug 784372

1.8.21
  * detect recursive redirections; the maximum of five redirections is
    still there though
  * after every HTTP 301 or 302 redirection, check the URL cache again
    Closes: SF bug 776851
  * put all HTTP 301 redirection answers also in the url cache as
    aliases of the original url. this could mess up some redirection
    warnings (ie warn about redirection when there is none), but it is
    more network efficient.

1.8.20
  * fix setting of domain in set_intern_url
    Changed: linkcheck/UrlData.py
  * - parse JS strings and comments
    - accept "<!- " as comment begin
    Changed: linkcheck/parser/htmlex.l
    Closes: SF bug 768661
  * quote url before submitting the request, the previous map() call
    was useless. Thanks Toby Dickenson for the patch.
    Changed: linkcheck/HttpUrlData.py
    Closes: SF bug 776416

1.8.19
  * add scheme colon in set_intern_url
    Changed: linkcheck/UrlData.py
  * fix threading option -t
    Changed: linkchecker, linkcheck/Config.py
  * do not try to get content of urls that have no content (eg mail)
    Closes: SF bug 765016
    Changed: linkcheck/{Mailto,Nntp,Telnet,}UrlData.py
  * added robots.txt FAQ, updated links
    Removed: norobots-rfc.html
    Changed: FAQ, WONTDO, TODO
  * add iso-8859-1 coding line to all .py files
    Changed: *.py
  * Correctly quote the HTML output
    Changed: linkcheck/log/HtmlLogger.py

1.8.18
  * fix option error messages for invalid integer arguments
    Changed files: linkchecker
  * enable infinite recursion with a negative -r value
    Changed files: linkcheck/{UrlData,Config}.py, linkchecker,
      linkchecker.1
  * if -s is given, add some link patterns to urls given on the
    command line automatically:
    for local files, add -i "^file:". For http and ftp urls, add
    the domain name -i "<domain>".
    Changed files: linkcheck/UrlData.py, linkchecker

1.8.17
  * fix parsing of missing end tag in "</a <a b=c>"
    Changed files: linkcheck/parser/htmllex.l
  * fix entity resolving in parsed html links
    Closes: SF bug #749543
    Changed files: linkcheck/StringUtil.py

1.8.16
  * also look at id attributes on anchor check
    (Closes SF Bug #741131)
    Changed files: linkcheck/{linkparser,UrlData}.py
  * minor parser cleanups
    Changed files: linkcheck/parser/*

1.8.15
  * Fix compile errors with C variable declarations in HTML parser.
    Thanks to Fazal Majid <fazal@majid.fm>
    Changed files: linkcheck/parser/htmlparse.[yc]

1.8.14
  * fix old bug in redirects not using the full url. This resulted in
    errors like (-2, "Name or service not known")
    Changed files: linkcheck/HttpUrlData.py
    Closes: SF Bug #729007
  * only remove anchors on IIS servers (other servers are doing quite
    well with anchors... can you spell A-p-a-c-h-e ?)
    Changed files: linkcheck/{HttpUrlData, UrlData}.py
  * Parser changes:
    - correctly propagate and display parsing errors
    - really cope with missing ">" end tags
    Changed files: linkcheck/parser/html{lex.l, parse.y},
      linkcheck/linkparse.py, linkcheck/UrlData.py
  * quote urls before a request
    Changed files: linkcheck/HttpUrlData.py

1.8.13
  * fix typo in manpage
    Changed files: linkchecker.1
  * remove anchor from HEAD and GET requests
    Changed files: linkcheck/{HttpUrlData, UrlData}.py

1.8.12
  * convert urlparts to list also on redirect
    Changed files: linkcheck/HttpUrlData.py

1.8.11
  * catch httplib.error exceptions
    Changed files: linkcheck/HttpUrlData.py
  * override interactive password question in robotparser.py
    Changed files: linkcheck/robotparser.py
  * switch to urllib2.py as default url connect.
    Changed files: linkcheck/UrlData.py
  * recompile html parser with flex 2.5.31
    Changed files: linkcheck/parser/{htmllex.c,Makefile}

1.8.10
  * new option --no-anchor-caching
    Changed files: linkchecker, linkcheck/{Config.py, UrlData.py}, FAQ
  * quote empty attribute arguments
    Changed files: linkcheck/parser/htmllex.[lc]

1.8.9
  * recompile with bison 1.875a
    Changed files: linkcheck/parser/htmlparse.[ch]
  * remove stpcpy declaration, fixes compile error on RedHat 7.x
    Changed files: linkcheck/parser/htmlsax.h
  * clarify keyboard interrupt warning to wait for active connections
    to finish
    Changed files: linkcheck/__init__.py
  * resolve &#XXX; number entity references
    Changed files: linkcheck/{StringUtil.py,linkname.py}

1.8.8
  * All amazon servers block HEAD requests with timeouts. Use GET as
    a workaround, but issue a warning.
    Changed files: linkcheck/HttpUrlData.py
  * restrict CGI access to localhost per default
    Changed files: lc.cgi, lc.fcgi, lc.sz_fcgi, linkcheck/lc_cgi.py

1.8.7
  * #define YY_NO_UNISTD_H on Windows systems, fixes build error with
    Visual Studio compiler
    Changed files: setup.py
  * use python2.2 headers for parser compile, not 2.1.
    Changed files: linkcheck/parser/Makefile

1.8.6
  * include a fixed robotparser.py (from Python 2.2 CVS maint branch)

1.8.5
  * fix config.warn to warn
    Changed files: linkcheck/__init.py
  * parser changes:
    o recognise "<! -- -->" HTML comments (seen at Eonline)
    o recognise "<! !>" HTML comments (seen at www.nba.com)
    o rebuild with flex 2.5.27
    Changed files: linkcheck/parser/htmllex.[lc]
  * added another url exclusion example to the FAQ
    numerate questions and answers
    Changed files: FAQ
  * fix linkchecker exceptions
    Changed files: linkcheck/{Ftp,Mailto,Nntp,Telnet,}UrlData.py,
      linkcheck/__init__.py

1.8.4
  * Improve error message for failing htmlsax module import
    Changed files: linkcheck/parser/htmllib.py
  * Regenerate parser with new bison 1.875
    Changed files: linkcheck/parser/htmlparser.c
  * Some CVS files were not the same as their local counterpart.
    Something went wrong. Anyway, I re-committed them.
    Changed files: a lot .py files

1.8.3
  * add missing imports for StringUtil in log classes, defer i18n of log
    field names (used for CGI scripts)
    Changed files: linkcheck/log/*.py
  * fixed wrong debug level comparison from > to >=
    Changed files: linkcheck/Config.py
  * JavaScript checks in the CGI scripts
    Changed files: lconline/lc_cgi.html.*
    Added files: lconline/check.js
  * Updated documentation with a link restriction example
    Changed files: linkchecker, linkchecker.1, FAQ
  * Updated po/pygettext.py to version 1.5, cleaned up some gettext
    usages.
  * updated i18n
    Added files: linkcheck/i18n.py
    Changed files: all .py files using i18n
  * Recognise "<! --" HTML comments
    Changed files: linkcheck/parser/htmllex.l
  * -a anchor option implies -w because anchor errors are always warnings
    Changed files: linkchecker
  * added AnsiColors.py and debug.py to split out some functions
    Changed files: a lot .py files using these things
  * use yy_size_t for parser alloc definitions, fixes build errors on 64bit
    architectures
    Changed files: linkcheck/parser/htmllex.l

1.8.2
  * - ignore invalid html attribute characters
    - ignore trailing garbage on html end tags
    - fixed debugging code with flex
    - use flex memory management interface
    - use only double quotes for attribute quoting
    - check quoting of all attributes
    Changed files: linkcheck/parser/htmllex.l
  * build parser with flex 2.5.25
    Changed files: linkcheck/parser/{Makefile, htmllex.c}
  * put shared code of cgi scripts in lc_cgi.py
    Changed files: lc.cgi, lc.fcgi, lc.sz_fcgi, linkcheck/lc_cgi.py
  * put some linebreaks and target="top" into HTML output
    Changed files: linkcheck/logging/HtmlLogger.py
  * add translated cgi files
    Changed files: setup.py, MANIFEST.in, debian/rules
    Added files: lconline/*.{de,en}
    Removed files: lconline/{leer.html,lc_cgi.html}

1.8.1
  * Add missing () to function call in proxy handling code
    Changed files: FtpUrlData.py
  * Use urlparse.url(un)split instead of urlparse.url(un)parse
    Changed files: FtpUrlData.py, UrlData.py, HttpUrlData.py,
      FileUrlData.py
  * Print size information if its available
    Changed files: FtpUrlData.py, UrlData.py, HttpUrlData.py
  * Add --warning-size-bytes option to print warning if content size
    exceeds the given byte limit
    Changed files: FtpUrlData.py, HttpUrlData.py, linkchecker, Config.py,
      linkchecker.1
  * Updated translations
    Changed files: po/linkchecker.pot, po/*.po
  * Parse supported file types for ftp links
    Changed files: FtpUrlData.py, FileUrlData.py, UrlData.py

1.8.0
  * Require Python >= 2.2.1, remove httplib.
    Changed files: setup.py, INSTALL, linkchecker
  * Add again python-dns, the Debian package maintainer is unresponsive
    Added files: linkcheck/DNS/*.py
    Changed files: INSTALL, setup.py
  * You must now use named constants for ANSII color codes
    Changed files: linkcheckerrc, linkcheck/log/ColoredLogger.py
  * Release RedHat 8.0 rpm packages.
    Changed files: setup.py, MANIFEST.in
  * remove --robots-txt from manpage, fix HTZP->HTTP typo
    Changed files: linkchecker.1

1.7.1
  * Fix memory leak in HTML parser flushing error path
    Changed files: htmlparse.y
  * add custom line and column tracking in parser
    Changed files: htmllex.l, htmlparse.y, htmlsax.h, htmllib.py
  * Use column tracking in urldata classes
    Changed files: UrlData.py, FileUrlData,py, FtpUrlData.py,
     HostCheckingUrlData.py
  * Use column tracking in logger classes
    Changed files: StandardLogger.py CVSLogger.py, ColoredLogger.py,
      HtmlLogger.py, SqlLogger.py

1.7.0
  * Added new HTML parser written in C as a Python extension module.
    It is faster and it is more fault tolerant.
    Of course, this means I cannot provide .exe installers any more
    since the distutils dont provide cross-compilation.

1.6.7
  * Removed check for <applet> tags codebase attribute, but honor it
    when checking applet links
  * Handle <applet> tags archive attribute as a comma separated list
    Closes: SF bug #636802
  * Fix a nasty bug in tag searching, which ignored tags with more
    than one link attribute in it.
  * Fix concatenation with relative base urls by first joining the
    parent url.
  * New commandline option --profile to write profile data.
  * Add httplib.py from Python CVS 2.1 maintenance branch, which has the
    skip_host keyword argument I am using now.

1.6.6
  * Use the new HTTPConnection/HTTPResponse interface of httplib
    Closes: SF bug #634679
    Changed files: linkcheck/HTTPUrlData.py, linkcheck/HTTPSUrlData.py
  * Updated the ftp online test
    Changed files: test/output/test_ftp

1.6.5
  * Catch the maximum recursion limit error while parsing links and
    print an error message instead of bailing out.
    Changed files: linkcheck/UrlData.py
  * Fixed Ctrl-C only interrupting one single thread, not the whole
    program.
    Changed files: linkcheck/UrlData.py, linkcheck/__init__.py
  * HTML syntax cleanup and relative cgi form url for the cgi scripts
    Changed files: lconline/*.html

1.6.4
  * Support for ftp proxies
    Changed files: linkcheck/FtpUrlData.py, linkcheck/HttpUrlData.py
    Added files: linkcheck/ProxyUrlData.py
  * Updated german translation

1.6.3:
  * Generate md5sum checksums for distributed files
    Changed files: Makefile
  * use "startswith" string method instead of a regex
    Changed files: linkchecker, linkcheck/UrlData.py
  * Add a note about supported languages, updated the documentation.
    Changed files: README, linkchecker, FAQ
  * Remove --robots-txt option from documentation, it is per default
    enabled and you cannot disable it from the command line.
    Changed files: linkchecker, po/*.po
  * fix --extern argument creation
    Changed files: linkchecker, linkcheck/UrlData.py
  * Print help if PyDNS module is not installed
    Changed files: linkcheck/UrlData.py
  * Print information if a proxy was used.
    Changed files: linkcheck/HttpUrlData.py
  * Updated german documentation
    Changed files: po/de.po
  * Oops, an FTP proxy is not used. Will make it in the next release.
    Changed files: linkcheck/FtpUrlData.py
  * Default socket timeout is now 30 seconds (10 was too short)

1.6.2:
  * Warn about unknown Content-Encodings. Dont parse HTML in this case.
  * Support deflate content encoding (snatched from Debians reportbug)
  * Add appropriate Accept-Encoding header to HTTP request.
  * Updated german translations

1.6.1:
  * FileUrlData.py: remove searching for links in text files, this is
    error prone. Just handle *.html and Opera Bookmarks.
  * Make separate ChangeLog from debian/changelog. For previous
    changes, see debian/changelog.
  * Default socket timeout is now 10 seconds
  * updated linkcheck/timeoutsocket.py to newest version
  * updated README and INSTALL
  * s/User-agent/User-Agent/, use same case as other browsers