Bastian Kleineidam
|
b18854649d
|
Count unique URLs for url queue limit.
|
2014-03-14 20:21:46 +01:00 |
|
Bastian Kleineidam
|
257644e660
|
Add cache length function to get number of cached elements.
|
2014-03-14 20:19:34 +01:00 |
|
Bastian Kleineidam
|
6b334dc79b
|
Fix URL result caching.
|
2014-03-08 19:35:10 +01:00 |
|
Bastian Kleineidam
|
b17211f162
|
Set for release.
|
2014-03-04 21:36:24 +01:00 |
|
Bastian Kleineidam
|
82f81241fd
|
Check all links and add better caching.
|
2014-03-03 23:29:45 +01:00 |
|
Bastian Kleineidam
|
6f205a2574
|
Support checking Sitemap: URLs in robots.txt files.
|
2014-03-01 20:25:19 +01:00 |
|
Bastian Kleineidam
|
0f0d79c7e0
|
Remove crawl-delay stuff
|
2014-03-01 20:01:42 +01:00 |
|
Bastian Kleineidam
|
7b34be590b
|
Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.
|
2014-03-01 00:12:34 +01:00 |
|
Bastian Kleineidam
|
c806be5c15
|
Updated copyright
|
2014-01-08 22:33:04 +01:00 |
|
Bastian Kleineidam
|
e0a2558b2b
|
Updated copyright.
|
2013-12-24 07:13:16 +01:00 |
|
Bastian Kleineidam
|
0ca63797bf
|
Remove content cache.
|
2013-12-10 23:41:52 +01:00 |
|
Bastian Kleineidam
|
36badddfac
|
Update cookie code from Python module.
|
2013-12-04 19:05:08 +01:00 |
|
Bastian Kleineidam
|
123578a4cd
|
Make per-host connection limits configurable.
|
2013-02-27 19:37:28 +01:00 |
|
Bastian Kleineidam
|
35bc79dd90
|
Updated copyright.
|
2013-01-25 21:14:27 +01:00 |
|
Bastian Kleineidam
|
faa743e876
|
Increase per-host connection limits.
|
2013-01-22 18:18:48 +01:00 |
|
Bastian Kleineidam
|
0283362ce6
|
Updated copyright.
|
2012-12-23 21:32:16 +01:00 |
|
Bastian Kleineidam
|
42a17cbb98
|
Prepare py3 port and display sys.argv on internal errors.
|
2012-11-26 18:49:07 +01:00 |
|
Bastian Kleineidam
|
e5735e2a5d
|
Fix URL queue handling.
|
2012-11-08 12:48:21 +01:00 |
|
Bastian Kleineidam
|
bc683577de
|
Remove URLs from the in_progress cache.
|
2012-11-08 11:03:16 +01:00 |
|
Bastian Kleineidam
|
eabaa41bd2
|
Do not check duplicate URLs.
|
2012-11-06 21:34:22 +01:00 |
|
Bastian Kleineidam
|
8750d55a73
|
Add configuration entry for maximum number of URLs.
|
2012-10-14 11:13:55 +02:00 |
|
Bastian Kleineidam
|
3b5877161c
|
Improved debugging.
|
2012-10-13 13:36:28 +02:00 |
|
Bastian Kleineidam
|
e1e80b7dd5
|
Remove addrinfo cache.
|
2012-10-10 10:54:58 +02:00 |
|
Bastian Kleineidam
|
871508ef5d
|
Add docs and updated copyright.
|
2012-10-10 06:53:16 +02:00 |
|
Bastian Kleineidam
|
6d47b76509
|
Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors.
|
2012-10-09 21:04:20 +02:00 |
|
Bastian Kleineidam
|
b56c054932
|
Use finer-grained robots.txt locks to improve lock contention.
|
2012-10-01 13:29:29 +02:00 |
|
Bastian Kleineidam
|
60305d8877
|
Code cleanup.
|
2012-09-23 21:20:12 +02:00 |
|
Bastian Kleineidam
|
e21187b275
|
Put in-progress URLs back near the front of URL queue, not at end.
|
2012-09-23 21:00:01 +02:00 |
|
Bastian Kleineidam
|
fba465e8e8
|
Fix robotstxt cache miss stats.
|
2012-09-21 21:12:28 +02:00 |
|
Bastian Kleineidam
|
4e59056ee7
|
Warn about duplicate URL contents.
|
2012-09-17 19:49:50 +02:00 |
|
Bastian Kleineidam
|
02a09dbb28
|
Add documentation.
|
2012-09-17 16:30:32 +02:00 |
|
Bastian Kleineidam
|
99bf8aa940
|
Updated copyright.
|
2012-09-17 16:09:55 +02:00 |
|
Bastian Kleineidam
|
6e1841cf1f
|
Print download and cache statistics.
|
2012-09-17 15:23:25 +02:00 |
|
Bastian Kleineidam
|
21db38546c
|
Updated copyright.
|
2012-09-02 23:36:31 +02:00 |
|
Bastian Kleineidam
|
3baaca47a0
|
Add maximum number of allowed puts on URL queue.
|
2012-09-02 22:44:29 +02:00 |
|
Bastian Kleineidam
|
d8fce1ceeb
|
Do not sort URL queue anymore.
|
2012-09-02 22:32:14 +02:00 |
|
Bastian Kleineidam
|
7a6436f08f
|
Increase checked cache in URL queue.
|
2012-09-02 22:21:49 +02:00 |
|
Bastian Kleineidam
|
9956f3712e
|
Properly detect too-long Unicode hostnames.
|
2011-12-05 20:51:42 +01:00 |
|
Bastian Kleineidam
|
6b52b28425
|
Send all domain-matching cookies that apply.
|
2011-08-03 21:21:44 +02:00 |
|
Bastian Kleineidam
|
48413de418
|
Display warning message for each cookie parsing error.
|
2011-08-03 19:27:36 +02:00 |
|
Bastian Kleineidam
|
8779158b2f
|
Sent cookies with more specific paths first.
|
2011-08-02 21:56:26 +02:00 |
|
Bastian Kleineidam
|
977d9e9ae6
|
Update cookie values instead of adding duplicate entries.
|
2011-08-01 20:26:31 +02:00 |
|
Bastian Kleineidam
|
2dfe62afa2
|
Updated copyright.
|
2011-02-14 21:07:07 +01:00 |
|
Bastian Kleineidam
|
c5884b8d87
|
Add function documentation.
|
2011-02-14 21:06:34 +01:00 |
|
Bastian Kleineidam
|
0589933b97
|
Reuse connections more than once.
|
2011-02-14 20:28:38 +01:00 |
|
Bastian Kleineidam
|
017a1087ba
|
Remove unneeded __future__ import
|
2010-11-21 10:45:30 +01:00 |
|
Bastian Kleineidam
|
5bb222b1df
|
Updated copyright
|
2010-10-24 01:02:39 +02:00 |
|
Bastian Kleineidam
|
fb4689dbe1
|
Fix previous commit.
|
2010-10-13 22:40:55 +02:00 |
|
Bastian Kleineidam
|
415efe262e
|
Added equality check for Cookies, and use that to augment the retrieved cookies.
|
2010-10-13 22:35:36 +02:00 |
|
Bastian Kleineidam
|
1ce1521a9f
|
Improved debug message and cleaned up some syntax.
|
2010-10-13 22:29:44 +02:00 |
|