Bastian Kleineidam
|
e8da486d66
|
Detect redirection errors when getting content.
|
2012-10-26 18:05:00 +02:00 |
|
Bastian Kleineidam
|
2390827735
|
Debug cookies.
|
2012-10-25 17:53:16 +02:00 |
|
Bastian Kleineidam
|
c44aa2db1f
|
Fix anchor checking of cached HTTP URLs by using the cached content type.
|
2012-10-25 06:37:10 +02:00 |
|
Bastian Kleineidam
|
dca52145d3
|
Misc stuff.
|
2012-10-24 22:59:28 +02:00 |
|
Bastian Kleineidam
|
b39158e65c
|
Improve available anchor message.
|
2012-10-24 22:21:46 +02:00 |
|
Bastian Kleineidam
|
dd2c963fac
|
Fix non-ASCII exception handling.
|
2012-10-24 22:14:45 +02:00 |
|
Bastian Kleineidam
|
64de760b97
|
Added debug statements for unparseable content types.
|
2012-10-24 22:06:42 +02:00 |
|
Bastian Kleineidam
|
3a51ac7662
|
Warn about accessible passwords in config files.
|
2012-10-15 14:36:10 +02:00 |
|
Bastian Kleineidam
|
8750d55a73
|
Add configuration entry for maximum number of URLs.
|
2012-10-14 11:13:55 +02:00 |
|
Bastian Kleineidam
|
2ebedbaaa6
|
Fix content reading.
|
2012-10-13 16:48:29 +02:00 |
|
Bastian Kleineidam
|
0e4e694ad1
|
Fix connection handling on redirects.
|
2012-10-13 13:36:43 +02:00 |
|
Bastian Kleineidam
|
3b5877161c
|
Improved debugging.
|
2012-10-13 13:36:28 +02:00 |
|
Bastian Kleineidam
|
d3b44be2c4
|
Improved documentation.
|
2012-10-13 12:03:19 +02:00 |
|
Bastian Kleineidam
|
7929a48d78
|
Fix url split with invalid port names.
|
2012-10-13 12:03:09 +02:00 |
|
Bastian Kleineidam
|
aa057bd36f
|
Fix colorama init error.
|
2012-10-12 20:39:34 +02:00 |
|
Bastian Kleineidam
|
6a204120b6
|
Handle stale file system links for local file checks.
|
2012-10-12 17:20:19 +02:00 |
|
Bastian Kleineidam
|
c4e15c7b88
|
Improved duplication url check.
|
2012-10-10 21:04:48 +02:00 |
|
Bastian Kleineidam
|
b758fc6f52
|
Reuse existing response.
|
2012-10-10 12:27:36 +02:00 |
|
Bastian Kleineidam
|
a0610310b4
|
Print debug on stderr.
|
2012-10-10 12:27:25 +02:00 |
|
Bastian Kleineidam
|
0c20ef5de4
|
Strip console characters only from line text.
|
2012-10-10 12:27:08 +02:00 |
|
Bastian Kleineidam
|
e1e80b7dd5
|
Remove addrinfo cache.
|
2012-10-10 10:54:58 +02:00 |
|
Bastian Kleineidam
|
20be0f2519
|
Strip control chars from logger output.
|
2012-10-10 10:54:30 +02:00 |
|
Bastian Kleineidam
|
f484a6776d
|
Use timeout value from configuration.
|
2012-10-10 10:53:52 +02:00 |
|
Bastian Kleineidam
|
871508ef5d
|
Add docs and updated copyright.
|
2012-10-10 06:53:16 +02:00 |
|
Bastian Kleineidam
|
63cf8adf54
|
Catch ValueError on invalid cookie expiration dates.
|
2012-10-10 06:44:38 +02:00 |
|
Bastian Kleineidam
|
06a25676c5
|
Only read the maximum data size plus one, not the whole file.
|
2012-10-10 06:35:33 +02:00 |
|
Bastian Kleineidam
|
3e1d51b8bf
|
Use RLock to simplify internal locking.
|
2012-10-09 21:11:35 +02:00 |
|
Bastian Kleineidam
|
c4cd66ea1b
|
Simplify decorator duration check logic.
|
2012-10-09 21:05:24 +02:00 |
|
Bastian Kleineidam
|
03a5d476b3
|
Use URL name if title is empty.
|
2012-10-09 21:04:54 +02:00 |
|
Bastian Kleineidam
|
6d47b76509
|
Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors.
|
2012-10-09 21:04:20 +02:00 |
|
Bastian Kleineidam
|
7d3ece502c
|
Support semaphores.
|
2012-10-09 19:46:06 +02:00 |
|
Bastian Kleineidam
|
ad8525c483
|
Improve BadStatusline error message.
|
2012-10-05 08:32:24 +02:00 |
|
Bastian Kleineidam
|
d15fafb1f7
|
Code cleanup.
|
2012-10-05 08:10:44 +02:00 |
|
Bastian Kleineidam
|
5ebd754cdb
|
Improved duplicate url check.
|
2012-10-01 16:11:45 +02:00 |
|
Bastian Kleineidam
|
ed7c60e491
|
Do not warn about duplicate URLs which can point to the same content.
|
2012-10-01 13:42:46 +02:00 |
|
Bastian Kleineidam
|
148846be67
|
Add flag to log lock contentions.
|
2012-10-01 13:32:30 +02:00 |
|
Bastian Kleineidam
|
b56c054932
|
Use finer-grained robots.txt locks to improve lock contention.
|
2012-10-01 13:29:29 +02:00 |
|
Bastian Kleineidam
|
27b61c3bfa
|
Fix gzip handling in http content decoder.
|
2012-09-30 14:00:49 +02:00 |
|
Bastian Kleineidam
|
cbc3bcb0d3
|
Sitemap logger fixes.
|
2012-09-23 23:20:21 +02:00 |
|
Bastian Kleineidam
|
60305d8877
|
Code cleanup.
|
2012-09-23 21:20:12 +02:00 |
|
Bastian Kleineidam
|
e21187b275
|
Put in-progress URLs back near the front of URL queue, not at end.
|
2012-09-23 21:00:01 +02:00 |
|
Bastian Kleineidam
|
1f3034b5f5
|
Sitemap logger fixes.
|
2012-09-23 20:59:38 +02:00 |
|
Bastian Kleineidam
|
38dd63f055
|
Code cleanup.
|
2012-09-23 16:19:42 +02:00 |
|
Bastian Kleineidam
|
7f8fd01b22
|
Add Accept-Encoding and Accept-Charset headers.
|
2012-09-23 15:06:44 +02:00 |
|
Bastian Kleineidam
|
03ecff22bb
|
Fix endless loop in http authentication.
|
2012-09-22 22:21:10 +02:00 |
|
Bastian Kleineidam
|
653b5f27dd
|
Updated ignored schemes.
|
2012-09-22 16:18:37 +02:00 |
|
Bastian Kleineidam
|
1c59cb4d4c
|
Use GET in case a HEAD method does not succeed, even if robots.txt content checkes denied the page. This way proper check results are achieved (but the content is still not checked, so it's ok).
|
2012-09-22 07:53:11 +02:00 |
|
Bastian Kleineidam
|
fba465e8e8
|
Fix robotstxt cache miss stats.
|
2012-09-21 21:12:28 +02:00 |
|
Bastian Kleineidam
|
f6b007f757
|
Fix useragent matching in robots.txt parser.
|
2012-09-21 21:12:13 +02:00 |
|
Bastian Kleineidam
|
bbf25106fa
|
Fix double result setting on http checks.
|
2012-09-21 20:33:15 +02:00 |
|