Commit graph

246 commits

Author SHA1 Message Date
Bastian Kleineidam
b39158e65c Improve available anchor message. 2012-10-24 22:21:46 +02:00
Bastian Kleineidam
dd2c963fac Fix non-ASCII exception handling. 2012-10-24 22:14:45 +02:00
Bastian Kleineidam
06a25676c5 Only read the maximum data size plus one, not the whole file. 2012-10-10 06:35:33 +02:00
Bastian Kleineidam
6d47b76509 Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors. 2012-10-09 21:04:20 +02:00
Bastian Kleineidam
ad8525c483 Improve BadStatusline error message. 2012-10-05 08:32:24 +02:00
Bastian Kleineidam
ed7c60e491 Do not warn about duplicate URLs which can point to the same content. 2012-10-01 13:42:46 +02:00
Bastian Kleineidam
c274b50c50 Store lowercase URL scheme in checker class. 2012-09-21 14:35:25 +02:00
Bastian Kleineidam
0941f6ff02 Improve exception handling by using unicode. 2012-09-21 14:29:20 +02:00
Bastian Kleineidam
7c6dce6136 Only warn non-empty site duplicates. 2012-09-20 20:39:36 +02:00
Bastian Kleineidam
a03090c20f Optimize intern/extern pattern parsing. 2012-09-20 20:19:13 +02:00
Bastian Kleineidam
bff217c58b Never log ignored warnings. 2012-09-20 12:44:40 +02:00
Bastian Kleineidam
600b7c0e69 Fix duplicate content warning when self.size is not set yet. 2012-09-20 12:44:23 +02:00
Bastian Kleineidam
18a200d85f Fix tests. 2012-09-19 11:05:26 +02:00
Bastian Kleineidam
3a352631ba Add modified field to loggers. 2012-09-18 12:12:00 +02:00
Bastian Kleineidam
4e59056ee7 Warn about duplicate URL contents. 2012-09-17 19:49:50 +02:00
Bastian Kleineidam
cb71f483a5 Warn about too long URLs. 2012-09-17 16:00:23 +02:00
Bastian Kleineidam
6e1841cf1f Print download and cache statistics. 2012-09-17 15:23:25 +02:00
Bastian Kleineidam
7a6436f08f Increase checked cache in URL queue. 2012-09-02 22:21:49 +02:00
Bastian Kleineidam
ecef16b2c9 Support WML sites. 2012-08-22 22:43:14 +02:00
Bastian Kleineidam
e65b5c72ce Correct list of schemes requiring host name. 2012-08-12 14:21:56 +02:00
Bastian Kleineidam
afc0ecd7a6 --ignore-url now really ignores URLs. 2012-08-12 11:16:29 +02:00
Bastian Kleineidam
0fd1a78378 Always compare encoded anchor names. 2012-06-27 20:59:53 +02:00
Bastian Kleineidam
5c045fef44 Fix UNC path handling on Windows. 2012-06-24 10:30:54 +02:00
Bastian Kleineidam
73b176d7c9 Fix URL joining: properly detect absolute URL. 2012-06-23 13:33:27 +02:00
Bastian Kleineidam
f107092a8a Fix handling of user/password info in URLs. 2012-06-10 22:07:42 +02:00
Bastian Kleineidam
98537eea2f Code cleanup: use add_url() function in UrlBase. 2012-06-10 14:24:17 +02:00
Bastian Kleineidam
db95fce77e Ignore PHP processing instructions in local files. 2012-06-10 14:02:01 +02:00
Bastian Kleineidam
837ab22d01 Syntax cleanup. 2012-06-10 11:46:05 +02:00
Bastian Kleineidam
77b8ec0fcd Fix writing temporary Word files. 2012-06-10 11:07:35 +02:00
Bastian Kleineidam
52dcf101e0 Remove rest of deprecated options. 2012-04-22 17:55:12 +02:00
Bastian Kleineidam
b9b8e3f5b2 Honor the charset encoding of the Content-Type HTTP
header when parsing HTML.
2012-03-22 22:45:11 +01:00
Bastian Kleineidam
4c9fd8d488 Cache real url. 2012-03-14 21:12:13 +01:00
Bastian Kleineidam
042b0569ec Fall back to W3C checkers. 2012-01-22 08:13:27 +01:00
Bastian Kleineidam
51cf55b7a6 Remove warning: prefix from warning messages. 2012-01-21 00:25:02 +01:00
Bastian Kleineidam
f1eb51d885 Updated copyright 2012-01-06 09:21:30 +01:00
Bastian Kleineidam
033280cfb9 Remove workarounds for old Python versions. 2012-01-04 20:17:53 +01:00
Bastian Kleineidam
3d9958dfbb Parse Safari bookmark files. 2011-12-17 16:38:25 +01:00
Bastian Kleineidam
27b7b1cb49 Fix W3C HTML validation. 2011-10-09 21:16:45 +02:00
Bastian Kleineidam
89ec0ee6a1 Check multiple matches of warning regex. 2011-10-09 19:00:35 +02:00
Bastian Kleineidam
72b65d94df Only check anchors in HTML pages. 2011-05-22 17:33:16 +02:00
Bastian Kleineidam
e5c2271533 Only check warning patterns in parseable contents. 2011-05-22 17:32:26 +02:00
Bastian Kleineidam
68ea03ee16 Support both Chromium and Google Chrome profile dirs to find bookmark files. 2011-05-21 11:47:54 +02:00
Bastian Kleineidam
78790d7c8d Improved anchor warning message display. 2011-05-20 06:48:06 +02:00
Bastian Kleineidam
343cf9703d Code cleanup: indentation, unused variables etc. 2011-05-15 18:36:30 +02:00
Bastian Kleineidam
10bbb696e8 Limit download file size to 5MB. 2011-05-05 21:10:55 +02:00
Bastian Kleineidam
719441cca5 Make module detection more robust and use it when possible. 2011-04-20 09:08:11 +02:00
Bastian Kleineidam
84f6d56a49 Print level in loggers xml, csv and sql. 2011-04-09 10:51:03 +02:00
Bastian Kleineidam
c0732e3d37 Do not print empty country information. 2011-04-06 17:22:48 +02:00
Bastian Kleineidam
82e5ba8ce6 Add warning tag attribute in XML loggers. 2011-03-15 13:42:21 +01:00
Bastian Kleineidam
7b33cfac7b Use stripped URL base constructing absolute URL. 2011-03-11 15:17:36 +01:00