3.4 "" (released xx.xx.xxxx) * Ignore decoding errors when retrieving the robots.txt URL. Type: bugfix Changed: linkcheck/robotparser2.py 3.3 "Four Brothers" (released 14.10.2005) * Fix parsing of ignore and nofollow in configuration files. Type: bugfix Changed: linkcheck/configuration.py Closes: SF bug #1311964, #1270783 * Ignore refresh meta content without a recognizable URL. Type: bugfix Changed: linkcheck/linkparse.py Closes: SF bug #1294456 * Catch CGI syntax errors in mailto: URLs, and add an appropriate warning about the error. Type: bugfix Changed: linkcheck/checker/mailtourl.py Closes: SF bug #1290563 * Initialize the i18n on module load time, so one does not have to call init_i18n() manually anymore. Fixes parts in the code (ie. the CGI script) that forgot to do this. Type: feature Changed: linkcheck/__init__.py Closes: SF bug #1277577 * Compress libraries in the .exe installer with UPX compressor. Type: feature Changed: setup.py * Ensure that base_url is Unicode for local files. Type: bugfix Changed: linkcheck/checker/fileurl.py Closes: Debian bug #332870 * The default encoding for program and logger output will be the preferred encoding now. It is determined from your current locale system settings. Type: feature Changed: linkchecker, linkcheck/checker/__init__.py, linkcheck/i18n.py, linkcheck/logger/__init__.py * Improved documentation about recursion and proxy support. Type: documentation Changed: linkchecker, doc/en/documentation.txt, doc/{en,de}/linkchecker.1 * Make sure that given proxy values are reasonably well-formed. Else abort checking of the current URL. Type: feature Changed: linkcheck/checker/proxysupport.py * Correctly catch internal errors in the check URL loop, and disable raising certain exceptions while the abort routine finishes up. Fixes the "dequeue mutated during iteration" errors. Type: bugfix Changed: linkcheck/checker/{__init__,consumer}.py Closes: SF bug #1325570, #1312865, #1307775, #1292919, #1264865 3.2 "Kiss kiss bang bang" (released 3.8.2005) * Fixed typo in redirection handling code. Type: bugfix Changed: linkcheck/checker/httpurl.py * Handle all redirections to different URL types, not just HTTP -> non-HTTP. Type: bugfix Changed: linkcheck/checker/httpurl.py * Workaround a urllib2.py bug raising ValueError on some failed HTTP authorisations. Type: bugfix Closes: SF bug #1250555 Changed: linkcheck/robotparser2.py * Fix invalid import in DNS resolver. Type: bugfix Changed: linkcheck/dns/resolver.py 3.1 "Suspicious" (released 18.7.2005) * Updated documentation for the HTML parser. Type: feature Changed: linkcheck/HtmlParser/* * Added new DNS debug level and use it for DNS routines. Type: feature Changed: linkcheck/__init__.py, doc/en/linkchecker.1, linkcheck/dns/{ifconfig,resolver}.py * Use tags for different LinkChecker warnings and allow them to be filtered with a configuration file entry. Type: feature Changed: linkchecker, linkcheck/checker/*.py, linkcheck/configuration.py * Add compatibility fix for HTTP/0.9 servers, from Python CVS. Type: bugfix Changed: linkcheck/httplib2.py * Add buffer flush fix for gzip files, from Python CVS. Type: bugfix Changed: linkcheck/gzip2.py * Do not cache URLs where a timeout or unusual error occurred. This way they get re-checked. Type: feature Changed: linkcheck/checker/{__init__, urlbase}.py * For HTTP return codes, try to use the official W3C name when it is defined. Type: feature Changed: linkcheck/checker/httpurl.py * Fix detection code of supported GCC command line options. this fixes a build error on some Unix systems (eg. FreeBSD). Type: bugfix Closes: SF bug #1238906 Changed: setup.py * Renamed the old "xml" output logger to "gxml" and added a new "xml" output logger which writes a custom XML format. Type: feature Changed: linkchecker, linkcheck/logger/*xml*.py * Use correct number of checked URLs in status output. Type: bugfix Closes: SF bug #1239943 Changed: linkcheck/checker/consumer.py 3.0 "The Jacket" (released 8.7.2005) * Catch all check errors, not just the ones inside of URL checking. Type: bugfix Changed: linkcheck/checker/__init__.py * Ensure that the name of a newly created thread is ASCII. Else there can be encoding errors. Type: bugfix Changed: linkcheck/strformat.py, linkcheck/checker/consumer.py, linkcheck/threader.py * Use our own gzip module to cope with incomplete gzip streams. Type: bugfix Closes: SF bug #1158475 Changed: linkcheck/checker/httpurl.py Added: linkcheck/gzip2.py * Fix hard coded python.exe path in the batch file linkchecker.bat. Type: bugfix Closes: SF bug #1206858 Changed: setup.py, install-linkchecker.py * Allow empty relative URLs. Note that a completely missing URL is still an error (ie. is valid, is an error). Type: bugfix Closes: SF bug #1217397 Changed: linkcheck/linkparse.py, linkcheck/logger/*.py, linkcheck/checker/urlbase.py * Added checks for more URL entries, especially favicon check was added. Type: feature Changed: linkcheck/linkparse.py * Limit memory consumption of psyco optimizer. Type: feature Changed: linkchecker * Always norm the URL before sending a request. Type: bugfix Changed: linkcheck/checker/urlbase.py * Send complete email address on SMTP VRFY command. Avoids a spurious warning about incomplete email addresses. Type: bugfix Changed: linkcheck/checker/mailtourl.py * The old intern/extern URL configuration has been replaced with a new and hopefully simpler one. Please see the documentation on how to upgrade to the new option syntax. Type: feature Changed: linkchecker, linkcheck/*.py * Honor XHTML in tag browser. Type: bugfix Closes: SF bug #1217356 Changed: linkcheck/linkparse.py * Catch curses.setupterm() errors. Type: bugfix Closes: SF bug #1216092 Changed: linkcheck/ansicolor.py * Only call _optcomplete bash completion function when it exists. Type: bugfix Closes: Debian bug #309076 Changed: config/linkchecker-completion * If a default config file (either /etc/linkchecker/linkcheckerrc or ~/.linkchecker/linkcheckerrc) does not exist it is not added to the config file list. Type: bugfix Changed: linkcheck/configuration.py * The default output encoding is now that of your locale, and not the hardcoded iso-8859-15 anymore. Type: feature Closes: Debian bug #307810 Changed: linkcheck/logger/__init__.py * Do not generate an empty user config dir ~/.linkchecker by default, only when needed. Type: feature Closes: Debian bug #307876 Changed: linkchecker * Redundant dot path at beginning of relative urls are now removed. Type: feature Changed: linkcheck/url.py, linkcheck/tests/test_url.py * Displaying warnings is now the default. One can disable warnings with the --no-warnings option. The old --warnings option is deprecated. Type: feature Changed: linkchecker, linkcheck/configuration.py * CGI parameters in URLs are now properly splitted and normed. Type: bugfix Changed: linkcheck/url.py * The number of encountered warnings is printed on program end. Type: feature Changed: linkcheck/logger/{text,html}.py * The deprecated --status option has been removed. Type: feature Changed: linkchecker * New option --disable-psyco to disable psyco compilation regardless if it is installed. Type: feature Changed: linkchecker * Since URL aliases from redirections do not represent the real URL with regards to warnings, the aliases are no longer cached. Type: bugfix Changed: linkcheck/checker/cache.py, linkcheck/checker/httpurl.py * The ignored url type honors now intern/extern filters. Type: bugfix Changed: linkcheck/checker/ignoreurl.py Closes: SF #1223956 2.9 "Sweat" (released 22.4.2005) * Use collections.deque object for incoming URL list. This is faster than a plain Python list object. Type: optimization Changed: linkcheck/checker/cache.py * Updated spanish translation, thanks to Servilio Afre Puentes. Type: feature Changed: po/es.po 2.8 "Robots" (released 8.4.2005) * Correct AttributeError in blacklist logger. Type: bugfix Closes: SF bug #1173823 Changed: linkcheck/logger/blacklist.py * Do not enforce an optional slash in empty URI paths. This resulted in spurious warnings. Closes: SF bug #1173841 Changed: linkcheck/url.py, linkcheck/tests/test_url.py * On NT-derivative Windows systems, the command line scripts is now named "linkchecker.bat" to facilitate execution. Type: feature Changed: setup.py, install-linkchecker.py, doc/en/index.txt * Use pydoc.pager() in strformat.paginate() instead of rolling out our own paging algorithm. Type: feature Changed: linkcheck/strformat.py 2.7 "Million Dollar Baby" (released 30.3.2005) * When a host has no MX record, fall back to A records as the mail host. Type: bugfix Changed: linkcheck/checker/mailtourl.py * Do not split CGI params on semicolons. This is wrong of course, but not supported by all servers. A later version of the CGI parser engine will split and re-join semicolons. Type: bugfix Changed: linkcheck/url.py * Make sure that URLs are always Unicode strings and not None. Type: bugfix Closes: SF bug #1168720 Changed: linkcheck/linkparse.py, linkcheck/containers.py * Fix the detection of persistent HTTP connections. Type: bugfix Changed: linkcheck/checker/httpheaders.py * HTTP connections with pending data will not be cached. Type: bugfix Changed: linkcheck/checker/httpurl.py * Add all URL aliases to the URL cache to avoid recursion. This also changes some invariants about what URLs are expected to be in the cache. Type: bugfix Changed: linkcheck/checker/cache.py 2.6 "Lord of the Rings" (released 15.3.2005) * Run with low priority. New option --priority to run with normal priority. Type: feature Changed: linkchecker, linkcheck/threader.py * If GeoIP Python wrapper is installed, log the country name as info. Type: feature Changed: linkcheck/checker/consumer.py Added: linkcheck/checker/geoip.py * New option --no-proxy-for that lets linkchecker contact the given hosts directly instead of going through a proxy. Also configurable in linkcheckerrc Type: feature Changed: linkchecker, linkcheck/checker/proxysupport.py, linkcheck/configuration.py * Give a useful error message for syntax errors in regular expressions. Type: bugfix Changed: linkchecker, linkcheck/configuration.py * Accept quoted urls in CSS attributes. Type: bugfix Changed: linkcheck/linkparse.py * Eliminate duplicate link reporting in the link parser. Type: bugfix Changed: linkcheck/linkparse.py * Do not send multiple Accept-Encoding headers. Type: bugfix Changed: linkcheck/checker/httpurl.py * Avoid deadlocks between the cache and the queue lock. Type: bugfix Changed: linkcheck/checker/consumer.py, linkcheck/checker/cache.py Added: linkcheck/lock.py * Always reinitialize stored HTTP headers on redirects; prevents a false alarm about recursive redirects. Type: bugfix Changed: linkcheck/checker/httpurl.py 2.5 "Spanglish" (released 4.3.2005) * Added spanish translation, thanks to Servilio Afre Puentes. Type: feature Changed: po/Makefile Added: po/es.po * Ignore a missing locale/ dir and fall back to the default locale instead of crashing. Type: bugfix Changed: linkcheck/i18n.py * Since profile.py and pstats.py have been removed from some Python standard installations (eg. Debian GNU/Linux), make their usage optional. Using --profile without an available profile.py prints a warning and runs linkchecker without profiling. Using --viewprof without an available pstats.py prints an error and exits. Type: bugfix Changed: linkchecker * Ensure stored result, info and warning strings are always Unicode. Else there might be encoding errors. Type: bugfix Closes: SF bug #1143553 Changed: linkcheck/checker/{urlbase,httpurl,ftpurl}.py, linkcheck/strformat.py * Fix -h help option on Windows systems Type: bugfix Closes: SF bug #1149987 Changed: linkchecker 2.4 "Kitchen stories" (released 9.2.2005) * Work around a Python 2.4 bug when HTTP 302 redirections are encountered in urllib2. Type: bugfix Changed: linkcheck/robotparser2.py * Be sure to use Unicode HTML parser messages. Type: bugfix Changed: linkcheck/linkparse.py * Make sure that FTP connections are opened when they are reused. Else open a new connection. Type: bugfix Changed: linkcheck/checker/ftpurl.py * Added '!' to the list of unquoted URL path characters. Type: bugfix Changed: linkcheck/url.py, linkcheck/tests/test_url.py * Fix Windows path name for network paths. Type: bugfix Closes: SF bug #1117839 Changed: linkcheck/checker/fileurl.py * Regularly remove expired connections from the connection pool. Type: feature Changed: linkcheck/checker/pool.py * Documentation and pylint cleanups. Type: feature Changed: linkcheck/*.py 2.3 "Napoleon Dynamite" (released 3.2.2005) * Use and require Python >= 2.4. Type: feature Changed: doc/install.txt, linkcheck/__init__.py, some scripts * Add square brackets ([]) to the list of allowed URL characters that do not need to be quoted. Type: bugfix Changed: linkcheck/url.py * Document the return value of the linkchecker command line script in the help text and man pages. Type: documentation Changed: linkchecker, doc/{en,de,fr}/linkchecker.1 * Always write the GML graph beginning, not just when "intro" field is defined. Type: bugfix Changed: linkcheck/logger/gml.py * Added DOT graph format output logger. Type: feature Added: linkcheck/logger/dot.py Changed: linkcheck/logger/__init__.py, linkcheck/configuration.py, linkchecker * Added ftpparse module to parse FTP LIST output lines. Type: feature Added linkcheck/ftpparse/* Changed: setup.py, linkcheck/checker/ftpurl.py * Ignore all errors when closing SMTP connections. Type: bugfix Changed: linkcheck/checker/mailtourl.py * Do not list FTP directory contents when they are not needed. Type: bugfix Changed: linkcheck/checker/ftpurl.py * Added connection pooling, used for HTTP and FTP connections. Type: feature Added: linkcheck/checker/pool.py Changed: linkcheck/checker/{cache, httpurl, ftpurl}.py * The new per-user configuration file is now stored in ~/.linkchecker/linkcheckerrc. Type: feature Changed: linkchecker, linkcheck/configuration.py, doc/{de,en,fr}/*.1 * The new blacklist output file is now stored in ~/.linkchecker/blacklist. Type: feature Changed: linkchecker, linkcheck/configuration.py, doc/{de,en,fr}/*.1 * Start the log output before appending new urls to the consumer since this can trigger logger.new_url(). Type: bugfix Changed: linkcheck/checker/{__init__, consumer}.py * Fix crash when using -t option. Type: bugfix Changed: linkchecker * Updated french translation of linkchecker, thanks to Yann Verley. Type: feature Changed: po/fr.po, doc/fr/linkchecker.1 2.2 "Cube" (released 25.01.2005) * CSV log format changes: - default separator is now a comma, not a semicolon - the quotechar can be configured and defaults to a double quote - write CSV column headers as the first data row (thanks to Hartmut Goebel) Type: feature Changed: linkcheck/logger/csvlog.py * Support bzip-compressed man pages in RPM install script. From Hartmut Goebel. Type: feature Changed: install-rpm.sh * HTML parser updates: - supply and use Py_CLEAR macro - only call set_encoding function if tag name is 'meta' Type: feature Changed: linkcheck/HtmlParser/* * Changed documentation format for epydoc. Type: documentation Changed: *.py * Fix FTP error message display crash. Type: bugfix Changed: linkcheck/checker/ftpurl.py * Ask before overwriting old profile data with --profile. Type: feature Changed: linkchecker * When searching for link names, limit the amount of data to look at to 256 characters. Do not look at the complete content anymore. This speeds up parsing of big HTML files significantly. Type: optimization Changed: linkcheck/linkparse.py * Support Psyco >= 1.4. If you installed older versions of Psyco, a warning is printed. Type: feature Changed: linkchecker, doc/install.txt * The build script setup.py uses -std=gnu99 when using GNU gcc compilers. This gets rid of several compile warnings. Type: feature Changed: setup.py * Correct the sent User-Agent header when getting robots.txt files. Added a simple robots.txt example file. Type: bugfix Changed: linkcheck/robotparser2.py Added: doc/robots.txt * Updated the included linkcheck/httplib2.py from the newest httplib.py found in Python CVS. Type: feature Changed: linkcheck/httplib2.py * Do not install unit tests. Only include them in the source distribution. Type: feature Changed: MANIFEST.in, setup.py 2.1 "Shogun Assassin" (released 11.1.2005) * Added XHTML support to the HTML parser. Type: feature Changed: linkcheck/HtmlParser/* * Support plural forms in gettext translations. Type: feature Changed: po/*.po* * Remove intern optcomplete installation, and make it optional to install, since it is only needed on Unix installations using bash-completion. Type: feature Changed: linkchecker, config/linkchecker-completion Removed: linkcheck/optcomplete.py * Minor enhancements in url parsing. Type: feature Changed: linkcheck/url.py * Sort according to preference when checking MX hosts so that preferred MX hosts get checked first. Type: bugfix Changed: linkcheck/checker/mailtourl.py * If mail VRFY command fails, print a warning message. Type: feature Changed: linkcheck/checker/mailtourl.py 2.0 "I Kina spiser de hunde" (released 7.12.2004) * Regenerate the HTML parser with new Bison version 1.875d. Also use the now supported Bison memory macros YYMALLOC and YYFREE. Type: feature Changed: linkcheck/HtmlParser/htmlparse.y * Updated installation and usage documentation. Type: documentation Changed: doc/install.txt, doc/index.txt * Added comment() method to loggers for printing comments. Type: feature Changed: linkcheck/logger/*.py * Updated and translated manpages. French translation from Yann Verley. German translation from me ;) Type: documentation Added: doc/de/linkchecker.de.1, doc/fr/linkchecker.fr.1 Changed: doc/en/linkchecker.1 * Fix mailto: URL norming by splitting the query type correctly. Type: bugfix Changed: linkcheck/url.py * Encode all output strings for display. Type: bugfix Changed: linkchecker * Accept -o option logger type as case independent string. Type: feature Changed: linkchecker * Internal Unicode handling fixed. Type: bugfix Changed: linkcheck/url.py, linkcheck/checker/*.py * Use correct FTP directory list parsing. Type: bugfix Changed: linkcheck/checker/ftpurl.py 2.0rc2 "El día de la bestia" (released 20.11.2004) * encode version string for --version output Type: bugfix Closes: SF bug #1067915 Changed: linkchecker * Added shell config note with --home install option. Type: documentation Closes: SF bug #1067919 Changed: doc/install.txt * Recheck robots.txt allowance and intern/extern filters for redirected URLs. Type: bugfix Closes: SF bug #1067914 Changed: linkcheck/checker/httpurl.py * Updated the warning and info messages to be always complete sentences. Type: feature Changed: linkcheck/checker/*.py, po/*, linkcheck/ftests/*.py, linkcheck/ftests/data/*.result * Added missing script_dir to the windows installer script. Use python.exe instead of pythonw.exe and --interactive option to call linkcheck script. Add Documentation link to the programs group. Type: bugfix Changed: install-linkchecker.py 2.0rc1 "The Incredibles" (released 16.11.2004) * Only instantiate SSL connections if SSL is supported Type: bugfix Changed: linkcheck/checker/httpurl.py * Close all opened log files. Type: bugfix Changed: linkcheck/logger/*.py * All loggers have now an output encoding. Valid encodings are listed in http://docs.python.org/lib/node127.html. The default encoding is "iso-8859-15". Type: feature Changed: linkcheck/logger/*.py * The --output and --file-output parameters can specify the encoding now. The documentation has been updated with this change. Type: feature Changed: linkchecker, linkchecker.1 * The encoding can also be specified in the linkcheckerrc config file. Type: feature Changed: config/linkcheckerrc * All leading directories of a given output log file are created automatically now. Errors creating these directories or opening the log file for writing abort the checking and print a usage mesage. Type: feature Changed: linkchecker, linkcheck/logger/__init__.py * Coerce url names to unicode Type: feature Changed: linkcheck/checker/__init__.py * Accept unicode filenames for resolver config Type: feature Changed: linkcheck/dns/resolver.py * LinkChecker accepts now Unicode domain names and converts them according to RFC 3490 (http://www.faqs.org/rfcs/rfc3490.html). Type: feature Changed: linkcheck/dns/resolver.py, linkcheck/url.py * Exceptions in the log systems are no more caught. Type: feature Changed: linkcheck/ansicolor.py * Remember a tag in the link parser. Saves one HTML parse. Type: feature Changed: linkcheck/checker/urlbase.py, linkcheck/linkparse.py * Optimize link name parsing of img alt tags. Type: feature Changed: linkcheck/linkname.py * Remove all references to the old 'colored' output logger. Type: documentation Closes: SF bug #1062011 Changed: linkchecker.1 * Synchronized the linkchecker documentation and the man page. Type: documentation Closes: SF bug #1062034 Changed: linkchecker, linkchecker.1 * Make --quiet an alias for -o none. Type: bugfix Closes: SF bug #1063144 Changed: linkchecker, linkcheck/configuration.py, linkcheck/checker/consumer.py * Re-norm a changed file:// base url, avoiding a spurious warning. Type: bugfix Changed: linkcheck/checker/fileurl.py * Wrong case of file links on Windows platforms now issue a warning. Type: feature Closes: SF bug #1062007 Changed: linkcheck/checker/fileurl.py * Updated the french translation. Thanks to Yann Verley. Type: feature Changed: po/fr.po 1.13.5 "Die Musterknaben" (released 22.9.2004) * Use xgettext with Python support for .pot file creation, adjusted developer documentation. Type: feature Changed: doc/install.txt, po/Makefile, MANIFEST.in Removed: po/pygettext.py, po/msgfmt.py * Use plural gettext form for log messages. Type: feature Changed: linkcheck/logger/{text,html}.py * Check if FTP file really exists instead of only the parent dir. Type: bugfix Changed: linkcheck/checker/ftpurl.py * Document the different logger output types. Type: documentation Changed: linkchecker, linkchecker.1 * Recursion into FTP directories and parseable files has been implemented. Type: feature Changed: linkcheck/checker/ftpurl.py 1.13.4 "Shaun of the dead" (released 17.9.2004) * Catch HTTP cookie errors and add a warning. Type: bugfix Changed: linkcheck/checker/httpurl.py * fix up response page object in robots.txt parser for the upcoming Python 2.4 release Type: bugfix Changed: linkcheck/robotparser2.py * remove cached urls from progress queue, fixing endless wait for checking to finish Type: bugfix Changed: linkcheck/checker/consumer.py * updated and synchronized documentation of the man page (linkchecker.1) and the linkchecker --help output. Type: documentation Changed: linkchecker, linkchecker.1 1.13.3 "Fight Club" (released 10.9.2004) * Prevent collapsing of relative parent dir paths. This fixes false positives on URLs of the form "../../foo". Closes: SF bug #1025459 Changed: linkcheck/url.py, linkcheck/tests/test_url.py 1.13.2 "Zatôichi" (released 8.9.2004) * Fix permissions of data files on install to be world readable. Type: bugfix Closes: SF bug #1022132 Changed: setup.py * Fixed the SQL logger when encountering empty URLs. Type: bugfix Closes: SF bug #1022156 Changed: linkcheck/logger/sql.py * Added notes about access rules for CGI scripts Type: documentation Changed: doc/install.txt * Updated french translation. Thanks, Yann Verley! Type: feature Changed: po/fr.po * initialize i18n at program start Type: bugfix Changed: linkchecker, linkcheck/lc_cgi.py * Make initialization function for i18n, and allow LOCPATH to override the locale directory. Type: feature Changed: linkcheck/__init__.py * Removed debug print statement when issueing linkchecker --help. Type: bugfix Changed: linkchecker * Reset to default ANSI color scheme, we don't know what background color the terminal has. Type: bugfix Closes: SF bug #1022158 Changed: linkcheck/configuration.py * Reinit the logger object when config files change values. Type: bugfix Changed: linkcheck/configuration.py * Only import ifconfig routines on POSIX systems. Type: bugfix Closes: SF bug #1024607 Changed: linkcheck/dns/resolver.py 1.13.1 "Old men in new cars" (released 3.9.2004) * Fixed RPM generation by adding the generated config file to the installed files list. Type: bugfix Changed: setup.py * Mention to remove old versions when upgrading in the documentation. Type: documentation Changed: doc/upgrading.txt, doc/install.txt * Fix typo in redirection cache handling. Type: bugfix Changed: linkcheck/checker/cache.py * The -F file output must honor verbose/quiet configuration. Type: bugfix Changed: linkcheck/checker/consumer.py * Generate all translation files under windows systems. Type: bugfix Changed: po/Makefile * Added windows binary installer script and configuration. Type: feature Changed: setup.py, setup.cfg, doc/install.txt Added: install-linkchecker.py * Do not raise an error when user and/or password of ftp URLs is not specified. Type: bugfix Changed: linkcheck/checker/ftpurl.py * honor anchor part of cache url key, handle the recursion check with an extra cache key Type: bugfix Changed: linkcheck/checker/{urlbase,cache,fileurl}.py * Support URL lists in text files with one URL per line. Empty lines or comment lines starting with '#' are ignored. Type: feature Changed: linkcheck/checker/fileurl.py * Added new option --extern-strict to specify strict extern url patterns. Type: feature Changed: linkchecker * Strip quotes from parsed CSS urls. Type: bugfix Changed: linkcheck/checker/urlbase.py 1.13.0 "The Butterfly Effect" (released 1.9.2004) * lots of internal code restructuring Type: code cleanup Changed: a lot * If checking revealed errors (or warnings with --warnings), the command line client exits with a non-zero exit status. Type: feature Closes: SF bug 1013191 Changed: linkchecker, linkcheck/checker/consumer.py * Specify the HTML doctype and charset in HTML output. Type: feature Closes: SF bug 1014283 Changed: linkcheck/logger/html.py * Fix endless loop on broken urls with non-empty anchor. Type: bugfix Changed: linkcheck/checker/httpurl.py * For news: or nntp: urls, entries in ~/.netrc are now ignored. You should give instead username/password info in the configuration file or on the command line. Type: bugfix Changed: linkcheck/checker/nntpurl.py * The HTML output shows now HTML and CSS validation links for the parent URL of invalid links. Type: feature Changed: linkcheck/logger/html.py * The status is now printed as default, it can be supressed with the new --no-status option. Type: feature Changed: linkchecker * The default recursion level is now infinite. Type: feature Changed: linkchecker * The 'outside of domain filter' is no more a warning but an informational message. A warning is inappropriate since the user is in full control over what links are extern or intern. Type: feature Closes: SF bug 1013206 Changed: linkcheck/urlbase.py * Renamed the --strict option to --extern-strict-all. Type: feature Changed: linkchecker * a new cache and queueing algorithm makes sure that no URL is checked twice. Type: feature Changed: linkcheck/checker/cache.py * the given user/password authententication is now also used to get robots.txt files. Type: feature Changed: linkcheck/robotparser2.py, linkcheck/checker/cache.py 1.12.3 "The Princess Bride" (released 27.5.2004) * fall back to GET on bad status line of a HEAD request Type: bugfix Changed: linkcheck/HttpUrlData.py * really fall back to GET with Zope servers; fixes infinite loop Type: bugfix Changed: linkcheck/HttpUrlData.py * better error msg on BadStatusLine error Type: feature Changed: linkcheck/UrlData.py * updated optcomplete to newest upstream Type: feature Changed: linkcheck/optcomplete.py * also quote query parts of urls Type: bugfix Changed: linkcheck/{HttpUrlData, url}.py * - preserve the order in which HTML attributes have been parsed - cope with trailing space in HTML comments Type: feature Changed: linkcheck/parser/{__init__.py,htmllex.l} Added: linkcheck/containers.py * rework anchor fallback Type: bugfix Changed: linkcheck/HttpUrlData.py * move contentAllowsRobot check to end of recursion check to avoid unnecessary GET request Type: bugfix Changed: linkcheck/UrlData.py 1.12.2 (release 4.4.2004) * use XmlUtils instead of xmlify for XML quoting Type: code cleanup Added: linkcheck/XmlUtils.py Changed: linkcheck/StringUtil.py, linkcheck/log/XMLLogger.py * don't require a value anymore with the --version option Type: bugfix Changed: linkchecker * before putting url data objects in the queue, check if they have correct syntax and are not already cached Type: optimization Changed: linkcheck/{UrlData,Config}.py * every once in a while, remove all already cached urls from the incoming queue. This action is reported when --status is given. Type: optimization Changed: linkcheck/Config.py * both changes above result in significant performance improvements when checking large websites, since a majority of the links tend to be navigation links to already-cached pages. Type: note * updated examples and put them before options in the man page for easier reading Type: documentation Changed: linkchecker, linkchecker.1 * added contact url and email to the HTTP User-Agent string, which gets us more accepted by some bot-blocking software; also see http://www.livejournal.com/bots/ Type: feature Changed: linkcheck/Config.py * only check robots.txt for http connections Type: bugfix Changed: linkcheck/{Http,}UrlData.py Closes: SF bug 928895 * updated regression tests Type: feature Changed: test/test_*.py, Makefile Added: test/run.sh * preserve the order in which HTML attributes have been parsed Type: feature Changed: linkcheck/parser/{__init__.py,htmllex.l} * handle and correct missing start quotes in HTML attributes Type: feature Changed: linkcheck/parser/htmllex.l * full parsing of .css files Type: feature Changed: linkcheck/{Http,}UrlData.py, linkcheck/linkparse.py * removed Gilman news draft Type: feature Removed: draft-gilman-news-url-00.txt 1.12.1 (release 21.2.2004) * raise IncompleteRead instead of ValueError on malformed chunked HTTP data Changed: linkcheck/httplib2.py * catch errors earlier in recursion check Changed: linkcheck/UrlData.py * quote url and parent url in log output Changed: linkcheck/log/*.py Added: linkcheck/url.py 1.12.0 (release 31.1.2004) * added LRU.setdefault function Changed: linkcheck/LRU.py Closes: SF bug 885916 * Added Mac OS X as supported platform (version 10.3 is known to work) Changed: README, INSTALL * HTML parser objects are now subclassable and collectable by the cyclic garbage collector Changed: linkcheck/parser/htmlparse.y * made some minor parser fixes for attribute scanning and JavaScript Changed: linkcheck/parser/htmllex.l * include the optcomplete module for bash autocompletion Added: linkcheck/optcomplete.py, linkcheck-completion Changed: MANIFEST.in, setup.py * print out nicer error message for unknown host names Changed: linkcheck/UrlData.py * added new logger type "none" printing out nothing which is handy for cron scripts. Changed: linkchecker, linkcheck/Config.py, linkcheck/log/__init__.py Added: linkcheck/log/NoneLogger.py * the -F file output option disables console output now Changed: linkchecker * added an example cron script Added: linkcheck-cron.sh Changed: MANIFEST.in, setup.py * only warn about missing anchor support servers when the url has actually an anchor Changed: linkcheck/HttpUrlData.py * always fall back to HTTP GET request when HEAD gave an error to cope with servers not supporting HEAD requests Changed: linkcheck/HttpUrlData.py, FAQ 1.10.3 (release 10.1.2004) * use the optparser module for command line parsing Changed: linkchecker, po/*.po * use Set() instead of hashmap Changed: linkcheck/Config.py * fix mime-type checking to allow parsing of .css stylesheets Changed: linkcheck/HttpUrlData.py * honor HTML meta tags for robots, ie. Changed: linkcheck/UrlData.py, linkcheck/linkparse.py * much less aggressive thread acquiring, this fixes the 100% CPU usage from the previous version Changed: linkcheck/Threader.py 1.10.2 (release 3.1.2004) * fixed CGI safe_url pattern, it was too strict Changed: linkcheck/lc_cgi.py * replace backticks with repr() or %r Changed: all .py files containing backticks, and po/*.po * make windows DNS nameserver parsing more robust Changed: linkcheck/DNS/Base.py Closes: SF bugs 863227,864383 * only cache used data, not the whole url object Changed: linkcheck/{Http,}UrlData.py * limit cached data Changed: linkcheck/{UrlData,Config}.py Added: linkcheck/LRU.py Closes: SF bug 864516 * use dummy_threading module and get rid of the _NoThreads functions Changed: linkchecker, linkcheck/{Config,Threader}.py, test/test_*.py * set default connection timeout to 60 seconds Changed: linkcheck/__init__.py * new option --status print regular messages about number of checked urls and urls still to check Changed: linkchecker, linkcheck/{__init__,Config}.py 1.10.1 (release 19.12.2003) * added Mandrake .spec file from Chris Green Added: linkchecker.spec Changed: MANIFEST.in * print last-modified date for http and https links in infos Changed: linkcheck/HttpUrlData.py * add detailed installation instructions for Windows Changed: INSTALL Closes: SF bug 857748 * updated the DNS nameserver config parse routines Changed: linkcheck/DNS/Base.py Added: linkcheck/DNS/winreg.py Removed: linkcheck/DNS/win32dns.py * fix https support test Changed: linkcheck/HttpUrlData.py 1.10.0 (released 7.12.2003) * catch httplib errors in robotparser Changed: linkcheck/robotparser2.py Closes: SF bug 836864 * - infinite recursion option with negative value works now - initialize self.urlparts to avoid crash when reading cached http urls - with --strict option do not add any automatic filters if the user gave his own on the command line Changed: linkcheck/UrlData.py 1.9.5 (released 31.10.2003) * Add Zope to servers with broken HEAD support, adjusted the FAQ Changed: linkcheck/HttpUrlData.py, FAQ Closes: SF bug 833419 * Disable psyco usage, it is causing infinite loops (this is a known issue with psyco); and it is disabling ctrl-c interrupts (this is also a known issue in psyco) Changed: linkchecker * use internal debug logger Changed: linkcheck/robotparser2.py * do not hardcode Accept-Encoding header in HTTP request Added: linkcheck/httplib2.py Changed: linkcheck/robotparser2.py 1.9.4 (released 22.10.2003) * parse CSS stylesheet files and check included urls, for example background images Changed: linkcheck/{File,Http,Ftp,}UrlData.py, linkcheck/linkparser.py * try to use psyco for the commandline linkchecker script Changed: linkchecker * when decompression of compressed HTML pages fails, assume the page is not compressed Changed: linkcheck/{robotparser2,HttpUrlData}.py 1.9.3 (released 16.10.2003) * re-added an updated robot parser which uses urllib2 and can decode compressed transfer encodings. Added: linkcheck/robotparser2.py * more restrictive url validity checking when running in CGI mode Changed: linkcheck/lc_cgi.py * accept more Windows path specifications, like file://C:\Dokume~1\test.html Changed: linkcheck/FileUrlData.py 1.9.2 * parser fixes: - do not #include , fixes build on some FreeBSD, Windows and Solaris/SunOS platforms - ignore first leading invalid backslash in a=\"b\" attributes Changed: linkcheck/parser/htmllex.{l,c} * add full script path to linkchecker on windows systems Changed: linkchecker.bat * fix generation of Linkchecker_Readme.txt under windows systems Changed: setup.py 1.9.1 * add documentation how to change the default C compiler Changed: INSTALL * fixed blacklist logging Changed: linkcheck/log/BlacklistLogger.py * removed unused imports Changed: linkcheck/*.py * parser fixes: - fixed parsing of end tags with trailing garbage - fixed parsing of script single comment lines Changed: linkcheck/parser/htmllex.l 1.9.0 * Require Python 2.3 - removed timeoutsocket.py and robotparser.py, using upstream - use True/False for boolean values - use csv module - use new-style classes Closes: SF bug 784977 Changed: a lot * update po makefiles and tools Changed po/* * start CGI output immediately Changed: lc.cgi, lc.fcgi, lc.sz_fcgi, linkcheck/lc_cgi.py Closes: SF bug 784331 1.8.22 * allow colons in HTML attribute names, used for namespaces Changed: linkcheck/parser/htmllex.l * fix match of intern patterns with --denyallow enabled Changed: linkcheck/UrlData.py * s/intern/internal/ and s/extern/external/ in the documentation Changed: linkchecker, linkchecker.1, FAQ * rename column "column" to "col" in SQL output, since "column" is a reserved keyword. Thanks Garvin Hicking for the hint. Changed: linkcheck/log/SQLLogger.py, create.sql * handle HTTP redirects to a non-http url Changed: linkcheck/{Http,}UrlData.py Closes: SF bug 784372 1.8.21 * detect recursive redirections; the maximum of five redirections is still there though * after every HTTP 301 or 302 redirection, check the URL cache again Closes: SF bug 776851 * put all HTTP 301 redirection answers also in the url cache as aliases of the original url. this could mess up some redirection warnings (ie warn about redirection when there is none), but it is more network efficient. 1.8.20 * fix setting of domain in set_intern_url Changed: linkcheck/UrlData.py * - parse JS strings and comments - accept "". Changed files: linkcheck/UrlData.py, linkchecker 1.8.17 * fix parsing of missing end tag in "" Changed files: linkcheck/parser/htmllex.l * fix entity resolving in parsed html links Closes: SF bug #749543 Changed files: linkcheck/StringUtil.py 1.8.16 * also look at id attributes on anchor check (Closes SF Bug #741131) Changed files: linkcheck/{linkparser,UrlData}.py * minor parser cleanups Changed files: linkcheck/parser/* 1.8.15 * Fix compile errors with C variable declarations in HTML parser. Thanks to Fazal Majid Changed files: linkcheck/parser/htmlparse.[yc] 1.8.14 * fix old bug in redirects not using the full url. This resulted in errors like (-2, "Name or service not known") Changed files: linkcheck/HttpUrlData.py Closes: SF Bug #729007 * only remove anchors on IIS servers (other servers are doing quite well with anchors... can you spell A-p-a-c-h-e ?) Changed files: linkcheck/{HttpUrlData, UrlData}.py * Parser changes: - correctly propagate and display parsing errors - really cope with missing ">" end tags Changed files: linkcheck/parser/html{lex.l, parse.y}, linkcheck/linkparse.py, linkcheck/UrlData.py * quote urls before a request Changed files: linkcheck/HttpUrlData.py 1.8.13 * fix typo in manpage Changed files: linkchecker.1 * remove anchor from HEAD and GET requests Changed files: linkcheck/{HttpUrlData, UrlData}.py 1.8.12 * convert urlparts to list also on redirect Changed files: linkcheck/HttpUrlData.py 1.8.11 * catch httplib.error exceptions Changed files: linkcheck/HttpUrlData.py * override interactive password question in robotparser.py Changed files: linkcheck/robotparser.py * switch to urllib2.py as default url connect. Changed files: linkcheck/UrlData.py * recompile html parser with flex 2.5.31 Changed files: linkcheck/parser/{htmllex.c,Makefile} 1.8.10 * new option --no-anchor-caching Changed files: linkchecker, linkcheck/{Config.py, UrlData.py}, FAQ * quote empty attribute arguments Changed files: linkcheck/parser/htmllex.[lc] 1.8.9 * recompile with bison 1.875a Changed files: linkcheck/parser/htmlparse.[ch] * remove stpcpy declaration, fixes compile error on RedHat 7.x Changed files: linkcheck/parser/htmlsax.h * clarify keyboard interrupt warning to wait for active connections to finish Changed files: linkcheck/__init__.py * resolve &#XXX; number entity references Changed files: linkcheck/{StringUtil.py,linkname.py} 1.8.8 * All amazon servers block HEAD requests with timeouts. Use GET as a workaround, but issue a warning. Changed files: linkcheck/HttpUrlData.py * restrict CGI access to localhost per default Changed files: lc.cgi, lc.fcgi, lc.sz_fcgi, linkcheck/lc_cgi.py 1.8.7 * #define YY_NO_UNISTD_H on Windows systems, fixes build error with Visual Studio compiler Changed files: setup.py * use python2.2 headers for parser compile, not 2.1. Changed files: linkcheck/parser/Makefile 1.8.6 * include a fixed robotparser.py (from Python 2.2 CVS maint branch) 1.8.5 * fix config.warn to warn Changed files: linkcheck/__init.py * parser changes: o recognise "" HTML comments (seen at Eonline) o recognise "" HTML comments (seen at www.nba.com) o rebuild with flex 2.5.27 Changed files: linkcheck/parser/htmllex.[lc] * added another url exclusion example to the FAQ numerate questions and answers Changed files: FAQ * fix linkchecker exceptions Changed files: linkcheck/{Ftp,Mailto,Nntp,Telnet,}UrlData.py, linkcheck/__init__.py 1.8.4 * Improve error message for failing htmlsax module import Changed files: linkcheck/parser/htmllib.py * Regenerate parser with new bison 1.875 Changed files: linkcheck/parser/htmlparser.c * Some CVS files were not the same as their local counterpart. Something went wrong. Anyway, I re-committed them. Changed files: a lot .py files 1.8.3 * add missing imports for StringUtil in log classes, defer i18n of log field names (used for CGI scripts) Changed files: linkcheck/log/*.py * fixed wrong debug level comparison from > to >= Changed files: linkcheck/Config.py * JavaScript checks in the CGI scripts Changed files: lconline/lc_cgi.html.* Added files: lconline/check.js * Updated documentation with a link restriction example Changed files: linkchecker, linkchecker.1, FAQ * Updated po/pygettext.py to version 1.5, cleaned up some gettext usages. * updated i18n Added files: linkcheck/i18n.py Changed files: all .py files using i18n * Recognise "= 2.2.1, remove httplib. Changed files: setup.py, INSTALL, linkchecker * Add again python-dns, the Debian package maintainer is unresponsive Added files: linkcheck/DNS/*.py Changed files: INSTALL, setup.py * You must now use named constants for ANSII color codes Changed files: linkcheckerrc, linkcheck/log/ColoredLogger.py * Release RedHat 8.0 rpm packages. Changed files: setup.py, MANIFEST.in * remove --robots-txt from manpage, fix HTZP->HTTP typo Changed files: linkchecker.1 1.7.1 * Fix memory leak in HTML parser flushing error path Changed files: htmlparse.y * add custom line and column tracking in parser Changed files: htmllex.l, htmlparse.y, htmlsax.h, htmllib.py * Use column tracking in urldata classes Changed files: UrlData.py, FileUrlData,py, FtpUrlData.py, HostCheckingUrlData.py * Use column tracking in logger classes Changed files: StandardLogger.py CVSLogger.py, ColoredLogger.py, HtmlLogger.py, SqlLogger.py 1.7.0 * Added new HTML parser written in C as a Python extension module. It is faster and it is more fault tolerant. Of course, this means I cannot provide .exe installers any more since the distutils dont provide cross-compilation. 1.6.7 * Removed check for tags codebase attribute, but honor it when checking applet links * Handle tags archive attribute as a comma separated list Closes: SF bug #636802 * Fix a nasty bug in tag searching, which ignored tags with more than one link attribute in it. * Fix concatenation with relative base urls by first joining the parent url. * New commandline option --profile to write profile data. * Add httplib.py from Python CVS 2.1 maintenance branch, which has the skip_host keyword argument I am using now. 1.6.6 * Use the new HTTPConnection/HTTPResponse interface of httplib Closes: SF bug #634679 Changed files: linkcheck/HTTPUrlData.py, linkcheck/HTTPSUrlData.py * Updated the ftp online test Changed files: test/output/test_ftp 1.6.5 * Catch the maximum recursion limit error while parsing links and print an error message instead of bailing out. Changed files: linkcheck/UrlData.py * Fixed Ctrl-C only interrupting one single thread, not the whole program. Changed files: linkcheck/UrlData.py, linkcheck/__init__.py * HTML syntax cleanup and relative cgi form url for the cgi scripts Changed files: lconline/*.html 1.6.4 * Support for ftp proxies Changed files: linkcheck/FtpUrlData.py, linkcheck/HttpUrlData.py Added files: linkcheck/ProxyUrlData.py * Updated german translation 1.6.3: * Generate md5sum checksums for distributed files Changed files: Makefile * use "startswith" string method instead of a regex Changed files: linkchecker, linkcheck/UrlData.py * Add a note about supported languages, updated the documentation. Changed files: README, linkchecker, FAQ * Remove --robots-txt option from documentation, it is per default enabled and you cannot disable it from the command line. Changed files: linkchecker, po/*.po * fix --extern argument creation Changed files: linkchecker, linkcheck/UrlData.py * Print help if PyDNS module is not installed Changed files: linkcheck/UrlData.py * Print information if a proxy was used. Changed files: linkcheck/HttpUrlData.py * Updated german documentation Changed files: po/de.po * Oops, an FTP proxy is not used. Will make it in the next release. Changed files: linkcheck/FtpUrlData.py * Default socket timeout is now 30 seconds (10 was too short) 1.6.2: * Warn about unknown Content-Encodings. Dont parse HTML in this case. * Support deflate content encoding (snatched from Debians reportbug) * Add appropriate Accept-Encoding header to HTTP request. * Updated german translations 1.6.1: * FileUrlData.py: remove searching for links in text files, this is error prone. Just handle *.html and Opera Bookmarks. * Make separate ChangeLog from debian/changelog. For previous changes, see debian/changelog. * Default socket timeout is now 10 seconds * updated linkcheck/timeoutsocket.py to newest version * updated README and INSTALL * s/User-agent/User-Agent/, use same case as other browsers