Commit graph

731 commits

Author SHA1 Message Date
Bastian Kleineidam
42a17cbb98 Prepare py3 port and display sys.argv on internal errors. 2012-11-26 18:49:07 +01:00
Bastian Kleineidam
7ae1eadadb Improve http status 305 code message. 2012-11-13 18:13:36 +01:00
Bastian Kleineidam
cd4abb1f12 Improve repr() of url data, and remove alexa test script. 2012-11-09 19:09:38 +01:00
Bastian Kleineidam
810a62e093 Fix file url checking. 2012-11-07 19:37:16 +01:00
Bastian Kleineidam
f9a7f5ef96 Restrict local file checking. 2012-11-07 18:07:00 +01:00
Bastian Kleineidam
eabaa41bd2 Do not check duplicate URLs. 2012-11-06 21:34:22 +01:00
Bastian Kleineidam
9745be9d71 Fix cookie path matching with empty paths. 2012-10-30 17:44:00 +01:00
Bastian Kleineidam
e2fd37b886 Encode user and password for telnet connection. 2012-10-30 17:44:00 +01:00
Bastian Kleineidam
c6d8b0050e Improve PHP command check. 2012-10-29 21:05:26 +01:00
Bastian Kleineidam
e8da486d66 Detect redirection errors when getting content. 2012-10-26 18:05:00 +02:00
Bastian Kleineidam
2390827735 Debug cookies. 2012-10-25 17:53:16 +02:00
Bastian Kleineidam
c44aa2db1f Fix anchor checking of cached HTTP URLs by using the cached content type. 2012-10-25 06:37:10 +02:00
Bastian Kleineidam
dca52145d3 Misc stuff. 2012-10-24 22:59:28 +02:00
Bastian Kleineidam
b39158e65c Improve available anchor message. 2012-10-24 22:21:46 +02:00
Bastian Kleineidam
dd2c963fac Fix non-ASCII exception handling. 2012-10-24 22:14:45 +02:00
Bastian Kleineidam
64de760b97 Added debug statements for unparseable content types. 2012-10-24 22:06:42 +02:00
Bastian Kleineidam
2ebedbaaa6 Fix content reading. 2012-10-13 16:48:29 +02:00
Bastian Kleineidam
0e4e694ad1 Fix connection handling on redirects. 2012-10-13 13:36:43 +02:00
Bastian Kleineidam
d3b44be2c4 Improved documentation. 2012-10-13 12:03:19 +02:00
Bastian Kleineidam
6a204120b6 Handle stale file system links for local file checks. 2012-10-12 17:20:19 +02:00
Bastian Kleineidam
b758fc6f52 Reuse existing response. 2012-10-10 12:27:36 +02:00
Bastian Kleineidam
e1e80b7dd5 Remove addrinfo cache. 2012-10-10 10:54:58 +02:00
Bastian Kleineidam
f484a6776d Use timeout value from configuration. 2012-10-10 10:53:52 +02:00
Bastian Kleineidam
06a25676c5 Only read the maximum data size plus one, not the whole file. 2012-10-10 06:35:33 +02:00
Bastian Kleineidam
6d47b76509 Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors. 2012-10-09 21:04:20 +02:00
Bastian Kleineidam
ad8525c483 Improve BadStatusline error message. 2012-10-05 08:32:24 +02:00
Bastian Kleineidam
d15fafb1f7 Code cleanup. 2012-10-05 08:10:44 +02:00
Bastian Kleineidam
ed7c60e491 Do not warn about duplicate URLs which can point to the same content. 2012-10-01 13:42:46 +02:00
Bastian Kleineidam
38dd63f055 Code cleanup. 2012-09-23 16:19:42 +02:00
Bastian Kleineidam
7f8fd01b22 Add Accept-Encoding and Accept-Charset headers. 2012-09-23 15:06:44 +02:00
Bastian Kleineidam
03ecff22bb Fix endless loop in http authentication. 2012-09-22 22:21:10 +02:00
Bastian Kleineidam
653b5f27dd Updated ignored schemes. 2012-09-22 16:18:37 +02:00
Bastian Kleineidam
1c59cb4d4c Use GET in case a HEAD method does not succeed, even if robots.txt content checkes denied the page. This way proper check results are achieved (but the content is still not checked, so it's ok). 2012-09-22 07:53:11 +02:00
Bastian Kleineidam
bbf25106fa Fix double result setting on http checks. 2012-09-21 20:33:15 +02:00
Bastian Kleineidam
c274b50c50 Store lowercase URL scheme in checker class. 2012-09-21 14:35:25 +02:00
Bastian Kleineidam
0941f6ff02 Improve exception handling by using unicode. 2012-09-21 14:29:20 +02:00
Bastian Kleineidam
049882e4fe Remove accept-encoding since some sites have wrong compression. 2012-09-20 22:39:15 +02:00
Bastian Kleineidam
7c6dce6136 Only warn non-empty site duplicates. 2012-09-20 20:39:36 +02:00
Bastian Kleineidam
a03090c20f Optimize intern/extern pattern parsing. 2012-09-20 20:19:13 +02:00
Bastian Kleineidam
b9d234c78a Fix wrong method name in SSL certificate check. 2012-09-20 16:28:01 +02:00
Bastian Kleineidam
bff217c58b Never log ignored warnings. 2012-09-20 12:44:40 +02:00
Bastian Kleineidam
600b7c0e69 Fix duplicate content warning when self.size is not set yet. 2012-09-20 12:44:23 +02:00
Bastian Kleineidam
18a200d85f Fix tests. 2012-09-19 11:05:26 +02:00
Bastian Kleineidam
b8f8bdf5fc Fix last modified formatting. 2012-09-19 10:09:19 +02:00
Bastian Kleineidam
3a352631ba Add modified field to loggers. 2012-09-18 12:12:00 +02:00
Bastian Kleineidam
4e59056ee7 Warn about duplicate URL contents. 2012-09-17 19:49:50 +02:00
Bastian Kleineidam
cb71f483a5 Warn about too long URLs. 2012-09-17 16:00:23 +02:00
Bastian Kleineidam
6e1841cf1f Print download and cache statistics. 2012-09-17 15:23:25 +02:00
Bastian Kleineidam
273230d98b Send HTTP Do-Not-Track header. 2012-09-14 22:41:38 +02:00
Bastian Kleineidam
7a6436f08f Increase checked cache in URL queue. 2012-09-02 22:21:49 +02:00