linkchecker/linkcheck
Antoine Beaupré 9d899d1dfa add --no-robots commandline flag
While this flag can be abused, it seems to me like a legitimate use
case that you want to check a fairly small document for mistakes,
which includes references to a website which has a robots.txt that
denies all robots. It turns out that most websites do *not* add a
permission for LinkCheck to use their site, and some sites, like the
Debian BTS for example, are very hostile with bots in general.

Between me using linkcheck and me using my web browser to check those
links one by one, there is not a big difference. In fact, using
linkcheck may be *better* for the website because it will use HEAD
requests instead of a GET, and will not fetch all page elements
(javascript, images, etc) which can often be fairly big.

Besides, hostile users will patch the software themselves: it took me
only a few minutes to disable the check, and a few more to make that
into a proper patch.

By forcing robots.txt without any other option, we are hurting our
good users and not keeping hostile users from doing harm.

The patch is still incomplete, but works. It lacks: documentation and
unit tests.

Closes: #508
2016-05-19 14:43:59 -04:00
..
bookmarks Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
cache Detect if "url_data" contains proxy attributes before using them. 2014-11-12 09:58:30 -05:00
checker add --no-robots commandline flag 2016-05-19 14:43:59 -04:00
configuration add --no-robots commandline flag 2016-05-19 14:43:59 -04:00
director Move GUI files to separate project 2016-01-23 13:28:15 +01:00
HtmlParser Fix parser for changes in bison 3.0.x 2015-11-26 12:33:44 +01:00
htmlutil Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
logger Print interrupt note in text output. 2014-04-30 20:17:33 +02:00
network More python3 fixes 2014-09-12 21:59:07 +02:00
parser Support itms-services: URLs. #532 2014-09-05 21:06:10 +02:00
plugins Added plugin for parsing and checking links in Markdown files 2014-11-11 15:35:18 +02:00
__init__.py Move GUI files to separate project 2016-01-23 13:28:15 +01:00
ansicolor.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
better_exchook2.py Rename external module to exclude it from some style checks. 2013-01-06 18:17:29 +01:00
cmdline.py Reactivate paging of help pages. 2014-09-11 19:42:42 +02:00
colorama.py Fix GUI startup for Windows. 2012-12-19 21:12:02 +01:00
containers.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
cookies.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
decorators.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
dummy.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
fileutil.py Move mime stuff into own submodule. 2014-05-10 21:22:10 +02:00
ftpparse.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
gzip2.py Updated gzip and httplib copies. 2013-03-11 20:21:58 +01:00
httputil.py Don't use encoding detection since it's very slow. 2014-03-27 12:27:11 +01:00
i18n.py Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
lc_cgi.py Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
loader.py Add missing docstring. 2014-03-01 19:14:43 +01:00
lock.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
log.py More python3 fixes 2014-09-12 21:59:07 +02:00
logconf.py Move GUI files to separate project 2016-01-23 13:28:15 +01:00
mem.py Remove trailing spaces. 2010-03-06 11:03:25 +01:00
memoryutil.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
mimeutil.py Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
robotparser2.py Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
socketutil.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
strformat.py Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
threader.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
trace.py Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
updater.py Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
url.py Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
winutil.py Add PDF link parsing. 2014-04-28 18:13:45 +02:00