Commit graph

112 commits

Author SHA1 Message Date
Chris Mayo
a15a2833ca Remove spaces after names in class method definitions
And also nested functions.

This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
44e81d27dd Remove inheriting object
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1 Remove # -*- coding: lines
Except for tests that include non-unicode characters:

tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Yaroslav Halchenko
b78c2d200e DOC: minor typo fix 2018-11-01 11:08:09 -04:00
Marius Gedminas
4a092c218c Whitespace bigotry 2017-03-14 17:18:27 +02:00
Petr Dlouhý
eaa538c814 don't check one url multiple times 2017-02-14 10:23:25 +01:00
Nicolas Bigaouette
4e56eceb35 Detect if "url_data" contains proxy attributes before using them.
Fix proposed by @colwilson in issue #555.
2014-11-12 09:58:30 -05:00
Bastian Kleineidam
35eb30432e Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
Bastian Kleineidam
06c6b80ed3 Fix proxy support. 2014-09-05 22:48:10 +02:00
Arlo Louis O'Keeffe
52337f82cb Use correct attribute 2014-09-03 09:36:22 +02:00
Bastian Kleineidam
b646293fd6 Remove unused import. 2014-07-15 22:38:57 +02:00
Bastian Kleineidam
90257a1b5e Replace twill with custom code. 2014-07-15 18:37:05 +02:00
Bastian Kleineidam
a665d35feb Use proxies and checker session in robots.txt. 2014-07-14 20:28:28 +02:00
Bastian Kleineidam
6c38b4165a Use given HTTP auth data for robots.txt fetching. 2014-07-14 19:50:11 +02:00
Bastian Kleineidam
22caa9367a Refactor recursion checks. 2014-04-10 17:50:55 +02:00
Bastian Kleineidam
08fbd891ef Do not check external robots.txt sitemaps. 2014-04-09 19:44:29 +02:00
Bastian Kleineidam
c57f607fc3 Use urldata.add_url() 2014-04-07 18:54:33 +02:00
Bastian Kleineidam
fc73c6ca6e Log number of checked unique URLs. 2014-03-14 23:46:17 +01:00
Bastian Kleineidam
19b8baf08c Move cached queue items to top once in a while. 2014-03-14 22:08:51 +01:00
Bastian Kleineidam
b18854649d Count unique URLs for url queue limit. 2014-03-14 20:21:46 +01:00
Bastian Kleineidam
257644e660 Add cache length function to get number of cached elements. 2014-03-14 20:19:34 +01:00
Bastian Kleineidam
6b334dc79b Fix URL result caching. 2014-03-08 19:35:10 +01:00
Bastian Kleineidam
b17211f162 Set for release. 2014-03-04 21:36:24 +01:00
Bastian Kleineidam
82f81241fd Check all links and add better caching. 2014-03-03 23:29:45 +01:00
Bastian Kleineidam
6f205a2574 Support checking Sitemap: URLs in robots.txt files. 2014-03-01 20:25:19 +01:00
Bastian Kleineidam
0f0d79c7e0 Remove crawl-delay stuff 2014-03-01 20:01:42 +01:00
Bastian Kleineidam
7b34be590b Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
Bastian Kleineidam
c806be5c15 Updated copyright 2014-01-08 22:33:04 +01:00
Bastian Kleineidam
e0a2558b2b Updated copyright. 2013-12-24 07:13:16 +01:00
Bastian Kleineidam
0ca63797bf Remove content cache. 2013-12-10 23:41:52 +01:00
Bastian Kleineidam
36badddfac Update cookie code from Python module. 2013-12-04 19:05:08 +01:00
Bastian Kleineidam
123578a4cd Make per-host connection limits configurable. 2013-02-27 19:37:28 +01:00
Bastian Kleineidam
35bc79dd90 Updated copyright. 2013-01-25 21:14:27 +01:00
Bastian Kleineidam
faa743e876 Increase per-host connection limits. 2013-01-22 18:18:48 +01:00
Bastian Kleineidam
0283362ce6 Updated copyright. 2012-12-23 21:32:16 +01:00
Bastian Kleineidam
42a17cbb98 Prepare py3 port and display sys.argv on internal errors. 2012-11-26 18:49:07 +01:00
Bastian Kleineidam
e5735e2a5d Fix URL queue handling. 2012-11-08 12:48:21 +01:00
Bastian Kleineidam
bc683577de Remove URLs from the in_progress cache. 2012-11-08 11:03:16 +01:00
Bastian Kleineidam
eabaa41bd2 Do not check duplicate URLs. 2012-11-06 21:34:22 +01:00
Bastian Kleineidam
8750d55a73 Add configuration entry for maximum number of URLs. 2012-10-14 11:13:55 +02:00
Bastian Kleineidam
3b5877161c Improved debugging. 2012-10-13 13:36:28 +02:00
Bastian Kleineidam
e1e80b7dd5 Remove addrinfo cache. 2012-10-10 10:54:58 +02:00
Bastian Kleineidam
871508ef5d Add docs and updated copyright. 2012-10-10 06:53:16 +02:00
Bastian Kleineidam
6d47b76509 Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors. 2012-10-09 21:04:20 +02:00
Bastian Kleineidam
b56c054932 Use finer-grained robots.txt locks to improve lock contention. 2012-10-01 13:29:29 +02:00
Bastian Kleineidam
60305d8877 Code cleanup. 2012-09-23 21:20:12 +02:00
Bastian Kleineidam
e21187b275 Put in-progress URLs back near the front of URL queue, not at end. 2012-09-23 21:00:01 +02:00
Bastian Kleineidam
fba465e8e8 Fix robotstxt cache miss stats. 2012-09-21 21:12:28 +02:00
Bastian Kleineidam
4e59056ee7 Warn about duplicate URL contents. 2012-09-17 19:49:50 +02:00
Bastian Kleineidam
02a09dbb28 Add documentation. 2012-09-17 16:30:32 +02:00