check links in web documents or full websites
Find a file
Antoine Beaupré 9d899d1dfa add --no-robots commandline flag
While this flag can be abused, it seems to me like a legitimate use
case that you want to check a fairly small document for mistakes,
which includes references to a website which has a robots.txt that
denies all robots. It turns out that most websites do *not* add a
permission for LinkCheck to use their site, and some sites, like the
Debian BTS for example, are very hostile with bots in general.

Between me using linkcheck and me using my web browser to check those
links one by one, there is not a big difference. In fact, using
linkcheck may be *better* for the website because it will use HEAD
requests instead of a GET, and will not fetch all page elements
(javascript, images, etc) which can often be fairly big.

Besides, hostile users will patch the software themselves: it took me
only a few minutes to disable the check, and a few more to make that
into a proper patch.

By forcing robots.txt without any other option, we are hurting our
good users and not keeping hostile users from doing harm.

The patch is still incomplete, but works. It lacks: documentation and
unit tests.

Closes: #508
2016-05-19 14:43:59 -04:00
cgi-bin Updated homepage URL. 2013-04-09 20:11:04 +02:00
config Move GUI files to separate project 2016-01-23 13:28:15 +01:00
doc Move GUI files to separate project 2016-01-23 13:28:15 +01:00
linkcheck add --no-robots commandline flag 2016-05-19 14:43:59 -04:00
po Move GUI files to separate project 2016-01-23 13:28:15 +01:00
scripts Code cleanup 2014-07-13 21:51:41 +02:00
tests Move GUI files to separate project 2016-01-23 13:28:15 +01:00
third_party More python3 fixes 2014-09-12 21:59:07 +02:00
windows Move GUI files to separate project 2016-01-23 13:28:15 +01:00
.gitattributes Add .gitattributes 2013-12-04 20:04:34 +01:00
.gitignore Replace msgfmt.py with local tools. 2014-09-08 18:25:03 +02:00
.project Add Eclipse Pydev project files. 2011-05-18 21:12:18 +02:00
.pydevproject Updated pydev settings. 2011-12-17 19:13:43 +01:00
.travis.yml Move GUI files to separate project 2016-01-23 13:28:15 +01:00
COPYING Moved some files into the doc/ subdirectory. 2010-03-06 21:52:25 +01:00
install-rpm.sh Fix RPM installer generation. 2012-04-11 18:41:34 +02:00
linkchecker add --no-robots commandline flag 2016-05-19 14:43:59 -04:00
linkchecker.freecode Set release date. 2014-07-16 07:34:21 +02:00
Makefile Move GUI files to separate project 2016-01-23 13:28:15 +01:00
MANIFEST.in Move GUI files to separate project 2016-01-23 13:28:15 +01:00
README.rst Move GUI files to separate project 2016-01-23 13:28:15 +01:00
requirements.txt Replace twill with custom code. 2014-07-15 18:37:05 +02:00
robots.txt Add non-ascii values to test robots.txt 2008-07-13 13:01:59 +00:00
setup.cfg Remove platform-specific installer stuff and ensure a build .whl wheel file can be built. 2016-01-17 09:05:21 +01:00
setup.py Move GUI files to separate project 2016-01-23 13:28:15 +01:00

LinkChecker
============

|Build Status|_ |Latest Version|_ |License|_

.. |Build Status| image:: https://travis-ci.org/wummel/linkchecker.svg?branch=master
.. _Build Status: https://travis-ci.org/wummel/linkchecker
.. |Latest Version| image:: http://img.shields.io/pypi/v/LinkChecker.svg
.. _Latest Version: https://pypi.python.org/pypi/LinkChecker
.. |License| image:: http://img.shields.io/badge/license-GPL2-d49a6a.svg
.. _License: http://opensource.org/licenses/GPL-2.0

Check for broken links in web sites.

Features
---------

- recursive and multithreaded checking and site crawling
- output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats
- HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support
- restrict link checking with regular expression filters for URLs
- proxy support
- username/password authorization for HTTP, FTP and Telnet
- honors robots.txt exclusion protocol
- Cookie support
- HTML5 support
- a command line and web interface
- various check plugins available, eg. HTML syntax and antivirus checks.

Installation
-------------
See doc/install.txt in the source code archive.
Python 2.7.2 or later is needed.

Usage
------
Execute ``linkchecker http://www.example.com``.
For other options see ``linkchecker --help``.