Commit graph

242 commits

Author SHA1 Message Date
Bastian Kleineidam
42a17cbb98 Prepare py3 port and display sys.argv on internal errors. 2012-11-26 18:49:07 +01:00
Bastian Kleineidam
7ae1eadadb Improve http status 305 code message. 2012-11-13 18:13:36 +01:00
Bastian Kleineidam
eabaa41bd2 Do not check duplicate URLs. 2012-11-06 21:34:22 +01:00
Bastian Kleineidam
9745be9d71 Fix cookie path matching with empty paths. 2012-10-30 17:44:00 +01:00
Bastian Kleineidam
e8da486d66 Detect redirection errors when getting content. 2012-10-26 18:05:00 +02:00
Bastian Kleineidam
2390827735 Debug cookies. 2012-10-25 17:53:16 +02:00
Bastian Kleineidam
c44aa2db1f Fix anchor checking of cached HTTP URLs by using the cached content type. 2012-10-25 06:37:10 +02:00
Bastian Kleineidam
64de760b97 Added debug statements for unparseable content types. 2012-10-24 22:06:42 +02:00
Bastian Kleineidam
2ebedbaaa6 Fix content reading. 2012-10-13 16:48:29 +02:00
Bastian Kleineidam
0e4e694ad1 Fix connection handling on redirects. 2012-10-13 13:36:43 +02:00
Bastian Kleineidam
b758fc6f52 Reuse existing response. 2012-10-10 12:27:36 +02:00
Bastian Kleineidam
e1e80b7dd5 Remove addrinfo cache. 2012-10-10 10:54:58 +02:00
Bastian Kleineidam
f484a6776d Use timeout value from configuration. 2012-10-10 10:53:52 +02:00
Bastian Kleineidam
06a25676c5 Only read the maximum data size plus one, not the whole file. 2012-10-10 06:35:33 +02:00
Bastian Kleineidam
6d47b76509 Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors. 2012-10-09 21:04:20 +02:00
Bastian Kleineidam
d15fafb1f7 Code cleanup. 2012-10-05 08:10:44 +02:00
Bastian Kleineidam
7f8fd01b22 Add Accept-Encoding and Accept-Charset headers. 2012-09-23 15:06:44 +02:00
Bastian Kleineidam
03ecff22bb Fix endless loop in http authentication. 2012-09-22 22:21:10 +02:00
Bastian Kleineidam
1c59cb4d4c Use GET in case a HEAD method does not succeed, even if robots.txt content checkes denied the page. This way proper check results are achieved (but the content is still not checked, so it's ok). 2012-09-22 07:53:11 +02:00
Bastian Kleineidam
bbf25106fa Fix double result setting on http checks. 2012-09-21 20:33:15 +02:00
Bastian Kleineidam
049882e4fe Remove accept-encoding since some sites have wrong compression. 2012-09-20 22:39:15 +02:00
Bastian Kleineidam
a03090c20f Optimize intern/extern pattern parsing. 2012-09-20 20:19:13 +02:00
Bastian Kleineidam
18a200d85f Fix tests. 2012-09-19 11:05:26 +02:00
Bastian Kleineidam
b8f8bdf5fc Fix last modified formatting. 2012-09-19 10:09:19 +02:00
Bastian Kleineidam
3a352631ba Add modified field to loggers. 2012-09-18 12:12:00 +02:00
Bastian Kleineidam
4e59056ee7 Warn about duplicate URL contents. 2012-09-17 19:49:50 +02:00
Bastian Kleineidam
6e1841cf1f Print download and cache statistics. 2012-09-17 15:23:25 +02:00
Bastian Kleineidam
273230d98b Send HTTP Do-Not-Track header. 2012-09-14 22:41:38 +02:00
Bastian Kleineidam
7a6436f08f Increase checked cache in URL queue. 2012-09-02 22:21:49 +02:00
Bastian Kleineidam
4c16d3e702 Make 401 unauthorized GET response a warning. 2012-08-26 11:32:17 +02:00
Bastian Kleineidam
ae15d51b30 Translate more result strings. 2012-08-23 23:59:33 +02:00
Bastian Kleineidam
ce4253263c Do not special case http->ftp redirects. 2012-08-23 23:56:36 +02:00
Bastian Kleineidam
7374068941 Remove unused import. 2012-08-23 16:46:14 +02:00
Bastian Kleineidam
73d64e50ab Fix redirection to new scheme. 2012-08-23 16:45:24 +02:00
Bastian Kleineidam
bc287d7710 Make unauthorized access responses with missing www-authenticate headers an error. 2012-08-23 15:52:11 +02:00
Bastian Kleineidam
e252bbf623 Remove Amazon quirk because the default behaviour handles this now. 2012-08-23 05:36:51 +02:00
Bastian Kleineidam
ecef16b2c9 Support WML sites. 2012-08-22 22:43:14 +02:00
Bastian Kleineidam
6915e2f989 Detect sites not supporting HEAD requests. 2012-08-14 18:43:39 +02:00
Bastian Kleineidam
f3b66b102d Fallback to GET when method HEAD is not allowed. 2012-08-13 07:07:21 +02:00
Bastian Kleineidam
6be3e9ddff Cleanup code and improve redirect anchor handling. 2012-08-12 11:14:56 +02:00
Bastian Kleineidam
5c045fef44 Fix UNC path handling on Windows. 2012-06-24 10:30:54 +02:00
Bastian Kleineidam
cbb13a8983 Add SSL certificate verification. 2012-06-18 23:05:44 +02:00
Bastian Kleineidam
f107092a8a Fix handling of user/password info in URLs. 2012-06-10 22:07:42 +02:00
Bastian Kleineidam
2dee223555 Allow memory dumps to be written. 2012-06-10 13:18:35 +02:00
Bastian Kleineidam
54ffb102d8 Code cleanup: add function for GET fallback. 2012-06-10 09:52:12 +02:00
Bastian Kleineidam
5c94c47901 Remove old Squid proxy workaround. 2012-06-10 09:45:07 +02:00
Bastian Kleineidam
bcbacec79a Code cleanup. 2012-05-10 21:05:33 +02:00
Bastian Kleineidam
61138744e6 Always use GET for Zope servers. 2012-05-08 20:47:47 +02:00
Bastian Kleineidam
797024c69b Fix URL connection cache key. 2012-04-04 22:58:09 +02:00
Bastian Kleineidam
4feea986b4 Fix concatenation of multiple cookie values. 2012-03-31 08:51:58 +02:00