Bastian Kleineidam
|
b56c054932
|
Use finer-grained robots.txt locks to improve lock contention.
|
2012-10-01 13:29:29 +02:00 |
|
Bastian Kleineidam
|
60305d8877
|
Code cleanup.
|
2012-09-23 21:20:12 +02:00 |
|
Bastian Kleineidam
|
e21187b275
|
Put in-progress URLs back near the front of URL queue, not at end.
|
2012-09-23 21:00:01 +02:00 |
|
Bastian Kleineidam
|
fba465e8e8
|
Fix robotstxt cache miss stats.
|
2012-09-21 21:12:28 +02:00 |
|
Bastian Kleineidam
|
4e59056ee7
|
Warn about duplicate URL contents.
|
2012-09-17 19:49:50 +02:00 |
|
Bastian Kleineidam
|
02a09dbb28
|
Add documentation.
|
2012-09-17 16:30:32 +02:00 |
|
Bastian Kleineidam
|
99bf8aa940
|
Updated copyright.
|
2012-09-17 16:09:55 +02:00 |
|
Bastian Kleineidam
|
6e1841cf1f
|
Print download and cache statistics.
|
2012-09-17 15:23:25 +02:00 |
|
Bastian Kleineidam
|
21db38546c
|
Updated copyright.
|
2012-09-02 23:36:31 +02:00 |
|
Bastian Kleineidam
|
3baaca47a0
|
Add maximum number of allowed puts on URL queue.
|
2012-09-02 22:44:29 +02:00 |
|
Bastian Kleineidam
|
d8fce1ceeb
|
Do not sort URL queue anymore.
|
2012-09-02 22:32:14 +02:00 |
|
Bastian Kleineidam
|
7a6436f08f
|
Increase checked cache in URL queue.
|
2012-09-02 22:21:49 +02:00 |
|
Bastian Kleineidam
|
9956f3712e
|
Properly detect too-long Unicode hostnames.
|
2011-12-05 20:51:42 +01:00 |
|
Bastian Kleineidam
|
6b52b28425
|
Send all domain-matching cookies that apply.
|
2011-08-03 21:21:44 +02:00 |
|
Bastian Kleineidam
|
48413de418
|
Display warning message for each cookie parsing error.
|
2011-08-03 19:27:36 +02:00 |
|
Bastian Kleineidam
|
8779158b2f
|
Sent cookies with more specific paths first.
|
2011-08-02 21:56:26 +02:00 |
|
Bastian Kleineidam
|
977d9e9ae6
|
Update cookie values instead of adding duplicate entries.
|
2011-08-01 20:26:31 +02:00 |
|
Bastian Kleineidam
|
2dfe62afa2
|
Updated copyright.
|
2011-02-14 21:07:07 +01:00 |
|
Bastian Kleineidam
|
c5884b8d87
|
Add function documentation.
|
2011-02-14 21:06:34 +01:00 |
|
Bastian Kleineidam
|
0589933b97
|
Reuse connections more than once.
|
2011-02-14 20:28:38 +01:00 |
|
Bastian Kleineidam
|
017a1087ba
|
Remove unneeded __future__ import
|
2010-11-21 10:45:30 +01:00 |
|
Bastian Kleineidam
|
5bb222b1df
|
Updated copyright
|
2010-10-24 01:02:39 +02:00 |
|
Bastian Kleineidam
|
fb4689dbe1
|
Fix previous commit.
|
2010-10-13 22:40:55 +02:00 |
|
Bastian Kleineidam
|
415efe262e
|
Added equality check for Cookies, and use that to augment the retrieved cookies.
|
2010-10-13 22:35:36 +02:00 |
|
Bastian Kleineidam
|
1ce1521a9f
|
Improved debug message and cleaned up some syntax.
|
2010-10-13 22:29:44 +02:00 |
|
Bastian Kleineidam
|
c59bbae587
|
Remove unused import and move geoip module from the cache module into base linkcheck module.
|
2010-09-29 15:15:21 +02:00 |
|
Bastian Kleineidam
|
6292ec54fa
|
Catch GeoIP lookup errors; ensure GeoIP information is Unicode.
|
2010-09-29 15:04:37 +02:00 |
|
Bastian Kleineidam
|
473c834f0c
|
Do not crash when geoip information is None.
|
2010-09-29 14:19:28 +02:00 |
|
Bastian Kleineidam
|
8995be1739
|
Support city-level geoip lookup; remove the geoip cache since lookup is fast enough; remove the duplicated geoip country name map.
|
2010-09-29 14:10:36 +02:00 |
|
Bastian Kleineidam
|
1446797020
|
Support pygeoip
|
2010-09-29 08:24:30 +02:00 |
|
Bastian Kleineidam
|
c4c098bd83
|
pep8-ify the source a little more
|
2010-03-13 08:47:12 +01:00 |
|
Bastian Kleineidam
|
5b5a62f6d5
|
Updated copyright
|
2010-03-10 00:05:05 +01:00 |
|
Bastian Kleineidam
|
57e3b05c88
|
limit cache sizes
|
2010-03-10 00:00:12 +01:00 |
|
Bastian Kleineidam
|
7c15d28f56
|
Prevent UnicodeDecodeError in robots.txt parsing.
|
2010-03-07 22:49:25 +01:00 |
|
Bastian Kleineidam
|
5e06b6b8d4
|
Updated FSF address in GPL blurb
|
2009-07-24 23:58:20 +02:00 |
|
Bastian Kleineidam
|
0afd5f7cc6
|
Properly detect a callable object in robots.txt callback
|
2009-03-06 20:10:26 +01:00 |
|
calvin
|
7b489b5897
|
Allow missing cache keys in the in_progress queue. This occurs when syntax checks already set the result.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3957 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2009-02-18 15:34:10 +00:00 |
|
calvin
|
e9805dbd8a
|
Updated copyright year to 2009
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3887 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2009-01-08 14:18:03 +00:00 |
|
calvin
|
c3b6fc5aa4
|
Readd
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3867 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-11-20 21:30:10 +00:00 |
|
calvin
|
7297519b04
|
Remove or replace unused variables.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3772 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-22 12:10:08 +00:00 |
|
calvin
|
bacb59597e
|
Use relative imports from Python 2.5
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3750 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-09 06:16:03 +00:00 |
|
calvin
|
3eac1be9ab
|
Require and use Python 2.5
Use Python 2.5 features and get rid of old compat code. Also some
code cleanups have been made.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3737 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-27 11:39:21 +00:00 |
|
calvin
|
6499cb1a63
|
updated copyright year
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3658 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-01-02 14:31:19 +00:00 |
|
calvin
|
8d2dc781e1
|
Ensure unused or expired connections are closed.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3617 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2007-11-30 16:42:41 +00:00 |
|
calvin
|
9cf3314eab
|
Use constants for warning tags, avoiding typos in string constants. And move the constants into a separate module const.py
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3611 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2007-11-29 07:50:22 +00:00 |
|
calvin
|
df48d4a905
|
bump up copyright year
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3534 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2007-01-01 14:57:38 +00:00 |
|
calvin
|
bef2494211
|
remove unused imports
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3482 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2006-09-24 10:13:59 +00:00 |
|
calvin
|
da15b15923
|
Split off the host wait time function, and use it with a separate lock
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3434 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2006-09-15 12:18:24 +00:00 |
|
calvin
|
6f0dbb5058
|
copy Queue.Queue code, for Python2.5 compatibility
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3371 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2006-06-13 22:22:29 +00:00 |
|
calvin
|
b442809838
|
look that cached URLs get checked quickly in large queues
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3315 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2006-05-25 11:35:24 +00:00 |
|