Commit graph

2604 commits

Author SHA1 Message Date
Bastian Kleineidam
e8da486d66 Detect redirection errors when getting content. 2012-10-26 18:05:00 +02:00
Bastian Kleineidam
2390827735 Debug cookies. 2012-10-25 17:53:16 +02:00
Bastian Kleineidam
c44aa2db1f Fix anchor checking of cached HTTP URLs by using the cached content type. 2012-10-25 06:37:10 +02:00
Bastian Kleineidam
dca52145d3 Misc stuff. 2012-10-24 22:59:28 +02:00
Bastian Kleineidam
b39158e65c Improve available anchor message. 2012-10-24 22:21:46 +02:00
Bastian Kleineidam
dd2c963fac Fix non-ASCII exception handling. 2012-10-24 22:14:45 +02:00
Bastian Kleineidam
64de760b97 Added debug statements for unparseable content types. 2012-10-24 22:06:42 +02:00
Bastian Kleineidam
3a51ac7662 Warn about accessible passwords in config files. 2012-10-15 14:36:10 +02:00
Bastian Kleineidam
8750d55a73 Add configuration entry for maximum number of URLs. 2012-10-14 11:13:55 +02:00
Bastian Kleineidam
2ebedbaaa6 Fix content reading. 2012-10-13 16:48:29 +02:00
Bastian Kleineidam
0e4e694ad1 Fix connection handling on redirects. 2012-10-13 13:36:43 +02:00
Bastian Kleineidam
3b5877161c Improved debugging. 2012-10-13 13:36:28 +02:00
Bastian Kleineidam
d3b44be2c4 Improved documentation. 2012-10-13 12:03:19 +02:00
Bastian Kleineidam
7929a48d78 Fix url split with invalid port names. 2012-10-13 12:03:09 +02:00
Bastian Kleineidam
aa057bd36f Fix colorama init error. 2012-10-12 20:39:34 +02:00
Bastian Kleineidam
6a204120b6 Handle stale file system links for local file checks. 2012-10-12 17:20:19 +02:00
Bastian Kleineidam
c4e15c7b88 Improved duplication url check. 2012-10-10 21:04:48 +02:00
Bastian Kleineidam
b758fc6f52 Reuse existing response. 2012-10-10 12:27:36 +02:00
Bastian Kleineidam
a0610310b4 Print debug on stderr. 2012-10-10 12:27:25 +02:00
Bastian Kleineidam
0c20ef5de4 Strip console characters only from line text. 2012-10-10 12:27:08 +02:00
Bastian Kleineidam
e1e80b7dd5 Remove addrinfo cache. 2012-10-10 10:54:58 +02:00
Bastian Kleineidam
20be0f2519 Strip control chars from logger output. 2012-10-10 10:54:30 +02:00
Bastian Kleineidam
f484a6776d Use timeout value from configuration. 2012-10-10 10:53:52 +02:00
Bastian Kleineidam
871508ef5d Add docs and updated copyright. 2012-10-10 06:53:16 +02:00
Bastian Kleineidam
63cf8adf54 Catch ValueError on invalid cookie expiration dates. 2012-10-10 06:44:38 +02:00
Bastian Kleineidam
06a25676c5 Only read the maximum data size plus one, not the whole file. 2012-10-10 06:35:33 +02:00
Bastian Kleineidam
3e1d51b8bf Use RLock to simplify internal locking. 2012-10-09 21:11:35 +02:00
Bastian Kleineidam
c4cd66ea1b Simplify decorator duration check logic. 2012-10-09 21:05:24 +02:00
Bastian Kleineidam
03a5d476b3 Use URL name if title is empty. 2012-10-09 21:04:54 +02:00
Bastian Kleineidam
6d47b76509 Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors. 2012-10-09 21:04:20 +02:00
Bastian Kleineidam
7d3ece502c Support semaphores. 2012-10-09 19:46:06 +02:00
Bastian Kleineidam
ad8525c483 Improve BadStatusline error message. 2012-10-05 08:32:24 +02:00
Bastian Kleineidam
d15fafb1f7 Code cleanup. 2012-10-05 08:10:44 +02:00
Bastian Kleineidam
5ebd754cdb Improved duplicate url check. 2012-10-01 16:11:45 +02:00
Bastian Kleineidam
ed7c60e491 Do not warn about duplicate URLs which can point to the same content. 2012-10-01 13:42:46 +02:00
Bastian Kleineidam
148846be67 Add flag to log lock contentions. 2012-10-01 13:32:30 +02:00
Bastian Kleineidam
b56c054932 Use finer-grained robots.txt locks to improve lock contention. 2012-10-01 13:29:29 +02:00
Bastian Kleineidam
27b61c3bfa Fix gzip handling in http content decoder. 2012-09-30 14:00:49 +02:00
Bastian Kleineidam
cbc3bcb0d3 Sitemap logger fixes. 2012-09-23 23:20:21 +02:00
Bastian Kleineidam
60305d8877 Code cleanup. 2012-09-23 21:20:12 +02:00
Bastian Kleineidam
e21187b275 Put in-progress URLs back near the front of URL queue, not at end. 2012-09-23 21:00:01 +02:00
Bastian Kleineidam
1f3034b5f5 Sitemap logger fixes. 2012-09-23 20:59:38 +02:00
Bastian Kleineidam
38dd63f055 Code cleanup. 2012-09-23 16:19:42 +02:00
Bastian Kleineidam
7f8fd01b22 Add Accept-Encoding and Accept-Charset headers. 2012-09-23 15:06:44 +02:00
Bastian Kleineidam
03ecff22bb Fix endless loop in http authentication. 2012-09-22 22:21:10 +02:00
Bastian Kleineidam
653b5f27dd Updated ignored schemes. 2012-09-22 16:18:37 +02:00
Bastian Kleineidam
1c59cb4d4c Use GET in case a HEAD method does not succeed, even if robots.txt content checkes denied the page. This way proper check results are achieved (but the content is still not checked, so it's ok). 2012-09-22 07:53:11 +02:00
Bastian Kleineidam
fba465e8e8 Fix robotstxt cache miss stats. 2012-09-21 21:12:28 +02:00
Bastian Kleineidam
f6b007f757 Fix useragent matching in robots.txt parser. 2012-09-21 21:12:13 +02:00
Bastian Kleineidam
bbf25106fa Fix double result setting on http checks. 2012-09-21 20:33:15 +02:00