Bastian Kleineidam
|
fa26876f67
|
Don't use encoding detection since it's very slow.
|
2014-03-27 12:27:11 +01:00 |
|
Bastian Kleineidam
|
49df359317
|
Some fixes when pyopenssl is used instead of python ssl module.
|
2014-03-26 19:59:17 +01:00 |
|
Bastian Kleineidam
|
dec0f6c8dc
|
Fix error with SNI checks
|
2014-03-26 12:38:16 +01:00 |
|
Bastian Kleineidam
|
a8623bc0bc
|
Display SSL info on redirects.
|
2014-03-26 07:16:03 +01:00 |
|
Bastian Kleineidam
|
be59802569
|
Set http connection charset.
|
2014-03-20 21:20:34 +01:00 |
|
Bastian Kleineidam
|
4c76345338
|
Add certificate valid date info and always set verify flag.
|
2014-03-19 17:16:42 +01:00 |
|
Bastian Kleineidam
|
9a7ad3a84f
|
Print SSL cipher info for https URLs.
|
2014-03-19 17:02:34 +01:00 |
|
Bastian Kleineidam
|
ce733ae76b
|
Don't check for robots.txt directives in local html files.
|
2014-03-19 16:33:22 +01:00 |
|
Bastian Kleineidam
|
6b334dc79b
|
Fix URL result caching.
|
2014-03-08 19:35:10 +01:00 |
|
Bastian Kleineidam
|
fab2c2da98
|
Improve content type setting.
|
2014-03-05 20:12:19 +01:00 |
|
Bastian Kleineidam
|
ef13a3fce1
|
Implement sitemap and sitemap index parsing.
|
2014-03-05 19:26:37 +01:00 |
|
Bastian Kleineidam
|
192cfab009
|
Cleanup of the UrlData.is_* functions
|
2014-03-05 19:23:16 +01:00 |
|
Bastian Kleineidam
|
6f205a2574
|
Support checking Sitemap: URLs in robots.txt files.
|
2014-03-01 20:25:19 +01:00 |
|
Bastian Kleineidam
|
0f0d79c7e0
|
Remove crawl-delay stuff
|
2014-03-01 20:01:42 +01:00 |
|
Bastian Kleineidam
|
7b34be590b
|
Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.
|
2014-03-01 00:12:34 +01:00 |
|
Bastian Kleineidam
|
c806be5c15
|
Updated copyright
|
2014-01-08 22:33:04 +01:00 |
|
Bastian Kleineidam
|
c076e312a2
|
Send an Accept header.
|
2014-01-08 19:56:00 +01:00 |
|
Bastian Kleineidam
|
e0a2558b2b
|
Updated copyright.
|
2013-12-24 07:13:16 +01:00 |
|
wummel
|
9646f0b652
|
Merge pull request #418 from chuckbjones/reset-url-on-fallback
Reset to original url when falling back to GET
|
2013-12-17 22:37:17 -08:00 |
|
Bastian Kleineidam
|
103e00b4d1
|
Allow disabling of ssl certificate checks.
|
2013-12-12 22:17:57 +01:00 |
|
Bastian Kleineidam
|
0ca63797bf
|
Remove content cache.
|
2013-12-10 23:41:52 +01:00 |
|
Bastian Kleineidam
|
2c5ede2eb7
|
Fallback to GET for Apache Coyote servers.
|
2013-12-08 08:22:56 +01:00 |
|
Bastian Kleineidam
|
023da7c993
|
Remove the duplicate URL content check.
|
2013-12-04 19:12:40 +01:00 |
|
Charles Jones
|
4294633c04
|
Close connection prior to falling back to get, since we change the url back to the original at that time.
|
2013-08-09 13:08:51 -05:00 |
|
Charles Jones
|
8bc138f18b
|
Reset to original url when falling back to GET
|
2013-07-30 13:38:59 -05:00 |
|
Bastian Kleineidam
|
c966fe6b24
|
Remove the http-wrong-redirect warning
|
2013-04-11 18:33:19 +02:00 |
|
Bastian Kleineidam
|
42a17cbb98
|
Prepare py3 port and display sys.argv on internal errors.
|
2012-11-26 18:49:07 +01:00 |
|
Bastian Kleineidam
|
7ae1eadadb
|
Improve http status 305 code message.
|
2012-11-13 18:13:36 +01:00 |
|
Bastian Kleineidam
|
eabaa41bd2
|
Do not check duplicate URLs.
|
2012-11-06 21:34:22 +01:00 |
|
Bastian Kleineidam
|
9745be9d71
|
Fix cookie path matching with empty paths.
|
2012-10-30 17:44:00 +01:00 |
|
Bastian Kleineidam
|
e8da486d66
|
Detect redirection errors when getting content.
|
2012-10-26 18:05:00 +02:00 |
|
Bastian Kleineidam
|
2390827735
|
Debug cookies.
|
2012-10-25 17:53:16 +02:00 |
|
Bastian Kleineidam
|
c44aa2db1f
|
Fix anchor checking of cached HTTP URLs by using the cached content type.
|
2012-10-25 06:37:10 +02:00 |
|
Bastian Kleineidam
|
64de760b97
|
Added debug statements for unparseable content types.
|
2012-10-24 22:06:42 +02:00 |
|
Bastian Kleineidam
|
2ebedbaaa6
|
Fix content reading.
|
2012-10-13 16:48:29 +02:00 |
|
Bastian Kleineidam
|
0e4e694ad1
|
Fix connection handling on redirects.
|
2012-10-13 13:36:43 +02:00 |
|
Bastian Kleineidam
|
b758fc6f52
|
Reuse existing response.
|
2012-10-10 12:27:36 +02:00 |
|
Bastian Kleineidam
|
e1e80b7dd5
|
Remove addrinfo cache.
|
2012-10-10 10:54:58 +02:00 |
|
Bastian Kleineidam
|
f484a6776d
|
Use timeout value from configuration.
|
2012-10-10 10:53:52 +02:00 |
|
Bastian Kleineidam
|
06a25676c5
|
Only read the maximum data size plus one, not the whole file.
|
2012-10-10 06:35:33 +02:00 |
|
Bastian Kleineidam
|
6d47b76509
|
Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors.
|
2012-10-09 21:04:20 +02:00 |
|
Bastian Kleineidam
|
d15fafb1f7
|
Code cleanup.
|
2012-10-05 08:10:44 +02:00 |
|
Bastian Kleineidam
|
7f8fd01b22
|
Add Accept-Encoding and Accept-Charset headers.
|
2012-09-23 15:06:44 +02:00 |
|
Bastian Kleineidam
|
03ecff22bb
|
Fix endless loop in http authentication.
|
2012-09-22 22:21:10 +02:00 |
|
Bastian Kleineidam
|
1c59cb4d4c
|
Use GET in case a HEAD method does not succeed, even if robots.txt content checkes denied the page. This way proper check results are achieved (but the content is still not checked, so it's ok).
|
2012-09-22 07:53:11 +02:00 |
|
Bastian Kleineidam
|
bbf25106fa
|
Fix double result setting on http checks.
|
2012-09-21 20:33:15 +02:00 |
|
Bastian Kleineidam
|
049882e4fe
|
Remove accept-encoding since some sites have wrong compression.
|
2012-09-20 22:39:15 +02:00 |
|
Bastian Kleineidam
|
a03090c20f
|
Optimize intern/extern pattern parsing.
|
2012-09-20 20:19:13 +02:00 |
|
Bastian Kleineidam
|
18a200d85f
|
Fix tests.
|
2012-09-19 11:05:26 +02:00 |
|
Bastian Kleineidam
|
b8f8bdf5fc
|
Fix last modified formatting.
|
2012-09-19 10:09:19 +02:00 |
|