Commit graph

2820 commits

Author SHA1 Message Date
Antoine Beaupré
da8cecd83c Merge remote-tracking branch 'anarcat/norobots' 2017-01-31 11:34:09 -05:00
Antoine Beaupré
46d96d0aa0 fix HTTPS URL checks
in Debian Jessie, linkchecker fails because of an API problem.

it completely breaks HTTPs checks.

this patch fixes the problem

from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=772947
2016-09-30 11:20:38 -04:00
Bastian Kleineidam
c2ce810c3f Fix python requests version check 2016-06-28 21:55:10 +02:00
Antoine Beaupré
9d899d1dfa add --no-robots commandline flag
While this flag can be abused, it seems to me like a legitimate use
case that you want to check a fairly small document for mistakes,
which includes references to a website which has a robots.txt that
denies all robots. It turns out that most websites do *not* add a
permission for LinkCheck to use their site, and some sites, like the
Debian BTS for example, are very hostile with bots in general.

Between me using linkcheck and me using my web browser to check those
links one by one, there is not a big difference. In fact, using
linkcheck may be *better* for the website because it will use HEAD
requests instead of a GET, and will not fetch all page elements
(javascript, images, etc) which can often be fairly big.

Besides, hostile users will patch the software themselves: it took me
only a few minutes to disable the check, and a few more to make that
into a proper patch.

By forcing robots.txt without any other option, we are hurting our
good users and not keeping hostile users from doing harm.

The patch is still incomplete, but works. It lacks: documentation and
unit tests.

Closes: #508
2016-05-19 14:43:59 -04:00
Bastian Kleineidam
0ef00eea56 Move GUI files to separate project 2016-01-23 13:28:15 +01:00
Bastian Kleineidam
549533d701 Improved debugging 2016-01-19 21:55:50 +01:00
wummel
a40c39be59 Merge pull request #560 from xvadim/feature
Added plugin for parsing and checking links in Markdown files
2016-01-19 07:30:34 +01:00
wummel
e2556abbb6 Merge pull request #561 from nbigaouette/issue555
Detect if "url_data" contains proxy attributes before using them.
2016-01-17 21:59:35 +01:00
Bastian Kleineidam
3d711666e1 Fix parser for changes in bison 3.0.x 2015-11-26 12:33:44 +01:00
Nicolas Bigaouette
4e56eceb35 Detect if "url_data" contains proxy attributes before using them.
Fix proposed by @colwilson in issue #555.
2014-11-12 09:58:30 -05:00
Vadim Khohlov
d4352fc828 Added plugin for parsing and checking links in Markdown files 2014-11-11 15:35:18 +02:00
Bastian Kleineidam
27937e6f83 Fix requests module version check. 2014-09-22 22:45:04 +02:00
Bastian Kleineidam
228bce1ba2 Add to instead of replace the HTTP client headers. 2014-09-20 12:17:42 +02:00
Bastian Kleineidam
92c4ca9a5e Debug request headers 2014-09-20 12:16:24 +02:00
Bastian Kleineidam
029c20ed98 More python3 fixes 2014-09-12 21:59:07 +02:00
Bastian Kleineidam
35eb30432e Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
Bastian Kleineidam
697e7b82e1 Search for system certs 2014-09-11 21:19:49 +02:00
Bastian Kleineidam
21c7200360 Reactivate paging of help pages. 2014-09-11 19:42:42 +02:00
Bastian Kleineidam
06c6b80ed3 Fix proxy support. 2014-09-05 22:48:10 +02:00
wummel
6580d37dc9 Merge pull request #545 from ArloL/patch-1
Use correct attribute
2014-09-05 21:13:40 +02:00
Bastian Kleineidam
ee4545399d Support itms-services: URLs. #532 2014-09-05 21:06:10 +02:00
Bastian Kleineidam
37d4ed6f83 Add hyphen and dot to the allowed scheme characters. 2014-09-05 20:59:54 +02:00
Bastian Kleineidam
c8df9355f0 Try to use the SSL certs from the certifi package. 2014-09-05 20:00:30 +02:00
Bastian Kleineidam
c684918ba6 Ignore urllib3 warnings about invalid SSL certs since we check them ourselves. 2014-09-05 20:00:00 +02:00
Bastian Kleineidam
2354f16dbb Catch urllib3 errors. 2014-09-05 19:59:28 +02:00
Arlo Louis O'Keeffe
52337f82cb Use correct attribute 2014-09-03 09:36:22 +02:00
Bastian Kleineidam
85dadc1f1a Add documentation 2014-07-16 07:37:19 +02:00
Bastian Kleineidam
37664ea8a4 Fix Word file check plugin. 2014-07-15 22:39:41 +02:00
Bastian Kleineidam
b646293fd6 Remove unused import. 2014-07-15 22:38:57 +02:00
Bastian Kleineidam
29193bbcc9 Fix login URL cookies and don't sanitize after config reading. 2014-07-15 22:23:38 +02:00
Bastian Kleineidam
032c4091c3 Some easy python3 compatibility changes. 2014-07-15 18:40:47 +02:00
Bastian Kleineidam
90257a1b5e Replace twill with custom code. 2014-07-15 18:37:05 +02:00
Bastian Kleineidam
a665d35feb Use proxies and checker session in robots.txt. 2014-07-14 20:28:28 +02:00
Bastian Kleineidam
266e9e189f Further code cleanup. 2014-07-14 20:14:00 +02:00
Bastian Kleineidam
6c38b4165a Use given HTTP auth data for robots.txt fetching. 2014-07-14 19:50:11 +02:00
Bastian Kleineidam
7838521b6e Code cleanup. 2014-07-14 19:49:01 +02:00
Bastian Kleineidam
100ce11d40 Sanitize CGI configuration. 2014-07-13 21:56:01 +02:00
Bastian Kleineidam
eafa1ed2da Updated unknown URL schemes. 2014-07-13 21:51:53 +02:00
Bastian Kleineidam
176b95a30e Do not strip quotes from resolved URLs. 2014-07-11 00:43:46 +02:00
Bastian Kleineidam
27702ddbac Catch log output start errors. 2014-07-09 21:54:47 +02:00
Bastian Kleineidam
6ff89e9e8c Fix GUI startup 2014-07-06 20:20:03 +02:00
Bastian Kleineidam
0fa7ed2699 Fix empty URL handling. 2014-07-03 23:34:40 +02:00
Bastian Kleineidam
1590ab6240 cleanup 2014-07-01 21:12:47 +02:00
Bastian Kleineidam
9a124513e3 Merge branch 'master' of github.com:wummel/linkchecker 2014-07-01 21:11:33 +02:00
wummel
9bb3852edf Merge pull request #515 from Mark-Hetherington/extern-redirect
When following redirections update url.extern
2014-07-01 21:11:13 +02:00
Bastian Kleineidam
12cc12db53 Add get_redirects() function. 2014-07-01 21:11:06 +02:00
Bastian Kleineidam
cde261c009 Parse Refresh: and Content-Location: header values for URLs. 2014-07-01 20:16:43 +02:00
Bastian Kleineidam
c3ec91ac6d Fix intern URL search pattern. 2014-06-13 23:52:21 +02:00
Bastian Kleineidam
ad8eb424f3 Merge Mark-Hetherington-xml-parse-warn with slight modifications. 2014-06-13 20:50:37 +02:00
Mark Hetherington
34d83db29c When following redirections update url.extern 2014-05-19 14:59:58 +10:00