Antoine Beaupré
da8cecd83c
Merge remote-tracking branch 'anarcat/norobots'
2017-01-31 11:34:09 -05:00
Antoine Beaupré
46d96d0aa0
fix HTTPS URL checks
...
in Debian Jessie, linkchecker fails because of an API problem.
it completely breaks HTTPs checks.
this patch fixes the problem
from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=772947
2016-09-30 11:20:38 -04:00
Bastian Kleineidam
c2ce810c3f
Fix python requests version check
2016-06-28 21:55:10 +02:00
Antoine Beaupré
9d899d1dfa
add --no-robots commandline flag
...
While this flag can be abused, it seems to me like a legitimate use
case that you want to check a fairly small document for mistakes,
which includes references to a website which has a robots.txt that
denies all robots. It turns out that most websites do *not* add a
permission for LinkCheck to use their site, and some sites, like the
Debian BTS for example, are very hostile with bots in general.
Between me using linkcheck and me using my web browser to check those
links one by one, there is not a big difference. In fact, using
linkcheck may be *better* for the website because it will use HEAD
requests instead of a GET, and will not fetch all page elements
(javascript, images, etc) which can often be fairly big.
Besides, hostile users will patch the software themselves: it took me
only a few minutes to disable the check, and a few more to make that
into a proper patch.
By forcing robots.txt without any other option, we are hurting our
good users and not keeping hostile users from doing harm.
The patch is still incomplete, but works. It lacks: documentation and
unit tests.
Closes : #508
2016-05-19 14:43:59 -04:00
Bastian Kleineidam
0ef00eea56
Move GUI files to separate project
2016-01-23 13:28:15 +01:00
Bastian Kleineidam
549533d701
Improved debugging
2016-01-19 21:55:50 +01:00
wummel
a40c39be59
Merge pull request #560 from xvadim/feature
...
Added plugin for parsing and checking links in Markdown files
2016-01-19 07:30:34 +01:00
wummel
e2556abbb6
Merge pull request #561 from nbigaouette/issue555
...
Detect if "url_data" contains proxy attributes before using them.
2016-01-17 21:59:35 +01:00
Bastian Kleineidam
3d711666e1
Fix parser for changes in bison 3.0.x
2015-11-26 12:33:44 +01:00
Nicolas Bigaouette
4e56eceb35
Detect if "url_data" contains proxy attributes before using them.
...
Fix proposed by @colwilson in issue #555 .
2014-11-12 09:58:30 -05:00
Vadim Khohlov
d4352fc828
Added plugin for parsing and checking links in Markdown files
2014-11-11 15:35:18 +02:00
Bastian Kleineidam
27937e6f83
Fix requests module version check.
2014-09-22 22:45:04 +02:00
Bastian Kleineidam
228bce1ba2
Add to instead of replace the HTTP client headers.
2014-09-20 12:17:42 +02:00
Bastian Kleineidam
92c4ca9a5e
Debug request headers
2014-09-20 12:16:24 +02:00
Bastian Kleineidam
029c20ed98
More python3 fixes
2014-09-12 21:59:07 +02:00
Bastian Kleineidam
35eb30432e
Added some Python3 fixes.
2014-09-12 19:36:30 +02:00
Bastian Kleineidam
697e7b82e1
Search for system certs
2014-09-11 21:19:49 +02:00
Bastian Kleineidam
21c7200360
Reactivate paging of help pages.
2014-09-11 19:42:42 +02:00
Bastian Kleineidam
06c6b80ed3
Fix proxy support.
2014-09-05 22:48:10 +02:00
wummel
6580d37dc9
Merge pull request #545 from ArloL/patch-1
...
Use correct attribute
2014-09-05 21:13:40 +02:00
Bastian Kleineidam
ee4545399d
Support itms-services: URLs. #532
2014-09-05 21:06:10 +02:00
Bastian Kleineidam
37d4ed6f83
Add hyphen and dot to the allowed scheme characters.
2014-09-05 20:59:54 +02:00
Bastian Kleineidam
c8df9355f0
Try to use the SSL certs from the certifi package.
2014-09-05 20:00:30 +02:00
Bastian Kleineidam
c684918ba6
Ignore urllib3 warnings about invalid SSL certs since we check them ourselves.
2014-09-05 20:00:00 +02:00
Bastian Kleineidam
2354f16dbb
Catch urllib3 errors.
2014-09-05 19:59:28 +02:00
Arlo Louis O'Keeffe
52337f82cb
Use correct attribute
2014-09-03 09:36:22 +02:00
Bastian Kleineidam
85dadc1f1a
Add documentation
2014-07-16 07:37:19 +02:00
Bastian Kleineidam
37664ea8a4
Fix Word file check plugin.
2014-07-15 22:39:41 +02:00
Bastian Kleineidam
b646293fd6
Remove unused import.
2014-07-15 22:38:57 +02:00
Bastian Kleineidam
29193bbcc9
Fix login URL cookies and don't sanitize after config reading.
2014-07-15 22:23:38 +02:00
Bastian Kleineidam
032c4091c3
Some easy python3 compatibility changes.
2014-07-15 18:40:47 +02:00
Bastian Kleineidam
90257a1b5e
Replace twill with custom code.
2014-07-15 18:37:05 +02:00
Bastian Kleineidam
a665d35feb
Use proxies and checker session in robots.txt.
2014-07-14 20:28:28 +02:00
Bastian Kleineidam
266e9e189f
Further code cleanup.
2014-07-14 20:14:00 +02:00
Bastian Kleineidam
6c38b4165a
Use given HTTP auth data for robots.txt fetching.
2014-07-14 19:50:11 +02:00
Bastian Kleineidam
7838521b6e
Code cleanup.
2014-07-14 19:49:01 +02:00
Bastian Kleineidam
100ce11d40
Sanitize CGI configuration.
2014-07-13 21:56:01 +02:00
Bastian Kleineidam
eafa1ed2da
Updated unknown URL schemes.
2014-07-13 21:51:53 +02:00
Bastian Kleineidam
176b95a30e
Do not strip quotes from resolved URLs.
2014-07-11 00:43:46 +02:00
Bastian Kleineidam
27702ddbac
Catch log output start errors.
2014-07-09 21:54:47 +02:00
Bastian Kleineidam
6ff89e9e8c
Fix GUI startup
2014-07-06 20:20:03 +02:00
Bastian Kleineidam
0fa7ed2699
Fix empty URL handling.
2014-07-03 23:34:40 +02:00
Bastian Kleineidam
1590ab6240
cleanup
2014-07-01 21:12:47 +02:00
Bastian Kleineidam
9a124513e3
Merge branch 'master' of github.com:wummel/linkchecker
2014-07-01 21:11:33 +02:00
wummel
9bb3852edf
Merge pull request #515 from Mark-Hetherington/extern-redirect
...
When following redirections update url.extern
2014-07-01 21:11:13 +02:00
Bastian Kleineidam
12cc12db53
Add get_redirects() function.
2014-07-01 21:11:06 +02:00
Bastian Kleineidam
cde261c009
Parse Refresh: and Content-Location: header values for URLs.
2014-07-01 20:16:43 +02:00
Bastian Kleineidam
c3ec91ac6d
Fix intern URL search pattern.
2014-06-13 23:52:21 +02:00
Bastian Kleineidam
ad8eb424f3
Merge Mark-Hetherington-xml-parse-warn with slight modifications.
2014-06-13 20:50:37 +02:00
Mark Hetherington
34d83db29c
When following redirections update url.extern
2014-05-19 14:59:58 +10:00