Commit graph

73 commits

Author SHA1 Message Date
Bastian Kleineidam
dfc4e97371 Fix content reading function. 2010-10-03 12:11:59 +02:00
Bastian Kleineidam
9e54bbfa57 Move URL retreiving functions into url.py module. 2010-10-03 08:46:49 +02:00
Bastian Kleineidam
ffcd274087 Updated copyright 2010-09-05 21:02:51 +02:00
Bastian Kleineidam
fb67df662c Use repr() for robotparser debug. 2010-09-04 18:01:12 +02:00
Bastian Kleineidam
5e06b6b8d4 Updated FSF address in GPL blurb 2009-07-24 23:58:20 +02:00
Bastian Kleineidam
f23c3ec10b Updated copied modules from upstream. 2009-03-04 23:49:00 +01:00
calvin
e9805dbd8a Updated copyright year to 2009
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3887 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-08 14:18:03 +00:00
calvin
c3b6fc5aa4 Readd
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3867 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-20 21:30:10 +00:00
calvin
bc48ce8a96 Close robotparser URL connections; simplify line parsing.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3853 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-07-28 16:56:24 +00:00
calvin
22e6a9e67d Fix encoding errors in robots.txt, making some sites like wikipedia.org
accesible again.

git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3848 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-07-13 12:51:45 +00:00
calvin
7297519b04 Remove or replace unused variables.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3772 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-05-22 12:10:08 +00:00
calvin
bc9b9ee07e Move http util function in a separate module.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3747 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-05-08 10:36:08 +00:00
calvin
3eac1be9ab Require and use Python 2.5
Use Python 2.5 features and get rid of old compat code. Also some
code cleanups have been made.


git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3737 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-04-27 11:39:21 +00:00
calvin
4055721fd4 Use internal gzip2 module
Use the internal gzip replacement module gzip2 for all GzipFile handling.


git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3685 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-04-14 22:33:55 +00:00
calvin
6499cb1a63 updated copyright year
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3658 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-01-02 14:31:19 +00:00
calvin
fe438941a9 cleanup
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3576 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2007-10-02 01:06:24 +00:00
calvin
df48d4a905 bump up copyright year
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3534 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2007-01-01 14:57:38 +00:00
calvin
3f099a6438 use boolean objects for rule line allowance
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3508 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-10-19 20:36:31 +00:00
calvin
0c5d34e9f9 don't discard robots.txt entries with only Allow: lines
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3471 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-09-21 09:14:28 +00:00
calvin
e8e6a8af9a set modified time after parsing of robots.txt entries
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3348 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-06-05 19:44:59 +00:00
calvin
19a7495b9e only accept ASCII robots.txt
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3339 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-06-04 21:07:08 +00:00
calvin
a57618a4ad use relative imports
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3335 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-06-01 14:06:19 +00:00
calvin
a4e9b8eab1 fix debugging
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3236 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-05-17 16:24:53 +00:00
calvin
a741d7922c add get_crawldelay method
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3226 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-05-17 15:35:48 +00:00
calvin
d73aa0e5bd parse crawl-delay parameter line
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3211 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-05-16 21:29:18 +00:00
calvin
2cfcb5c0bb avoid double timeouts by raising timeout errors in robots.txt retrieval
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3171 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-05-14 12:58:31 +00:00
calvin
dc9f04e6dc adjust debug asserts
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3159 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-05-13 21:03:21 +00:00
calvin
e92aee054c updated copyright
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3010 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2006-01-03 19:12:47 +00:00
calvin
856ff8ef2a assert debugs
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2987 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-12-18 08:55:42 +00:00
calvin
df34e1a8e9 remove unused imports
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2919 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-10-25 13:48:30 +00:00
calvin
c9f5d1a0b1 catch gzip errors, and use linkchecker debugging
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2910 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-10-15 00:06:48 +00:00
calvin
f24bb87e54 add missing return
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2784 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-07-20 12:37:32 +00:00
calvin
afa8750dc3 catch ValueError raised by urllib2
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2783 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-07-20 09:45:30 +00:00
calvin
baf51d1f5d cleanup
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2752 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-07-14 17:56:05 +00:00
calvin
63b76ec642 Use HTTPMessage() in all urllib handlers, really fixing the bug noted in http://www.python.org/sf/1117588. The workaround has been removed.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2603 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-05-18 17:53:39 +00:00
calvin
44075c47bf clean up raise calls
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2294 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-02-08 14:52:50 +00:00
calvin
973b6d5098 work around for 302 redirect handling error
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2283 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-02-07 12:10:17 +00:00
calvin
05c9b8b5e6 use linkchecker agent on getting /robots.txt
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2194 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-01-24 09:45:22 +00:00
calvin
adc1d02217 documentation
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2165 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-01-19 21:11:43 +00:00
calvin
d030a5b054 documentation updated
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2164 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-01-19 15:56:48 +00:00
calvin
647d7167ee documentation syntax
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2163 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-01-19 15:08:02 +00:00
calvin
700d564be7 documentation updates
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2148 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-01-18 01:00:45 +00:00
calvin
b06f144ced updated copyright year
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2122 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2005-01-11 02:22:43 +00:00
calvin
c97f68f70a accept unicode in robots.txt can_fetch
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1924 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-11-09 00:00:59 +00:00
calvin
62b2784ebc python 2.4 compat
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1805 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-09-16 20:11:38 +00:00
calvin
ce9dc6fbe9 increment robotparser version
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1655 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-08-31 21:45:34 +00:00
calvin
bc6bd34ffc fix password manager interface
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1654 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-08-31 21:42:34 +00:00
calvin
bffdfa68fd robots.txt password support
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1649 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-08-31 21:20:51 +00:00
calvin
4756641e1b source code restructuring
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1423 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-08-16 19:20:53 +00:00
calvin
1f6670e8cd import fixes
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1399 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-07-26 13:47:19 +00:00