Commit graph

212 commits

Author SHA1 Message Date
Bastian Kleineidam
f1eb51d885 Updated copyright 2012-01-06 09:21:30 +01:00
Bastian Kleineidam
033280cfb9 Remove workarounds for old Python versions. 2012-01-04 20:17:53 +01:00
Bastian Kleineidam
3d9958dfbb Parse Safari bookmark files. 2011-12-17 16:38:25 +01:00
Bastian Kleineidam
27b7b1cb49 Fix W3C HTML validation. 2011-10-09 21:16:45 +02:00
Bastian Kleineidam
89ec0ee6a1 Check multiple matches of warning regex. 2011-10-09 19:00:35 +02:00
Bastian Kleineidam
72b65d94df Only check anchors in HTML pages. 2011-05-22 17:33:16 +02:00
Bastian Kleineidam
e5c2271533 Only check warning patterns in parseable contents. 2011-05-22 17:32:26 +02:00
Bastian Kleineidam
68ea03ee16 Support both Chromium and Google Chrome profile dirs to find bookmark files. 2011-05-21 11:47:54 +02:00
Bastian Kleineidam
78790d7c8d Improved anchor warning message display. 2011-05-20 06:48:06 +02:00
Bastian Kleineidam
343cf9703d Code cleanup: indentation, unused variables etc. 2011-05-15 18:36:30 +02:00
Bastian Kleineidam
10bbb696e8 Limit download file size to 5MB. 2011-05-05 21:10:55 +02:00
Bastian Kleineidam
719441cca5 Make module detection more robust and use it when possible. 2011-04-20 09:08:11 +02:00
Bastian Kleineidam
84f6d56a49 Print level in loggers xml, csv and sql. 2011-04-09 10:51:03 +02:00
Bastian Kleineidam
c0732e3d37 Do not print empty country information. 2011-04-06 17:22:48 +02:00
Bastian Kleineidam
82e5ba8ce6 Add warning tag attribute in XML loggers. 2011-03-15 13:42:21 +01:00
Bastian Kleineidam
7b33cfac7b Use stripped URL base constructing absolute URL. 2011-03-11 15:17:36 +01:00
Bastian Kleineidam
420c21c2de Strip leading and trailing whitespace from URLs. 2011-03-07 12:33:09 +01:00
Bastian Kleineidam
21e4824f65 Fix typo calling get_temp_file() function. 2011-03-07 09:57:40 +01:00
Bastian Kleineidam
0d4377d1ba Support Google Chrome Bookmark files. 2011-02-15 18:26:00 +01:00
Bastian Kleineidam
25b6dc2e57 Refactor bookmark parsing code into own package. 2011-02-15 17:31:42 +01:00
Bastian Kleineidam
c5884b8d87 Add function documentation. 2011-02-14 21:06:34 +01:00
Bastian Kleineidam
4a0c63aa56 Fix joining of URLs when parent URL has CGI parameter. 2011-02-08 21:25:55 +01:00
Bastian Kleineidam
71b15b70f4 Updated copyright 2011-01-06 09:59:57 +01:00
Bastian Kleineidam
5f70b7210f Add tempfile utility function. 2011-01-06 09:52:11 +01:00
Bastian Kleineidam
d011d1524c Parse PHP files recursively. 2010-12-28 17:11:29 +01:00
Bastian Kleineidam
fd3fe8dcaa Fix missing content types for cached URLs. 2010-12-23 07:37:36 +01:00
Bastian Kleineidam
6090e1a66c Print anchor in __str__() 2010-12-21 20:55:49 +01:00
Bastian Kleineidam
7c08290c44 Fix broken anchor checking. 2010-12-20 19:55:26 +01:00
Bastian Kleineidam
224061e284 Fix to_wire by looking of URL parts have been initialized. 2010-12-15 13:24:12 +01:00
Bastian Kleineidam
2b2121b9ed Added content type and domain to URL logging info. 2010-12-14 20:30:53 +01:00
Bastian Kleineidam
01184784ef Remove warning about Unicode domains which are more widely supported now. 2010-12-11 07:58:15 +01:00
Bastian Kleineidam
f14340a0a8 Do not check content of already cached URLs. 2010-10-27 19:52:48 +02:00
Bastian Kleineidam
d9e981e497 Don't log a warning if commandline URL has been redirected. 2010-10-26 16:24:27 +02:00
Bastian Kleineidam
4375d35328 Add warning about unsupported HTTP authentication, and revert the realm changes. 2010-10-25 22:41:31 +02:00
Bastian Kleineidam
332fa4f8f9 Prepare multi-realm auth configuration. 2010-10-25 22:07:16 +02:00
Bastian Kleineidam
a8aa3bdb00 Another fix to ensure get_content() is only called when allowed. 2010-10-13 22:14:43 +02:00
Bastian Kleineidam
61e611e4bf Prevent unallowed content read when checking for robots.txt allowance in HTML files. 2010-10-12 00:40:34 +02:00
Bastian Kleineidam
1d0db02192 Refactor getting user and password for an URL. 2010-10-11 20:11:15 +02:00
Bastian Kleineidam
e494d6bbb6 Move MIME type detection into fileutil.py module, and use mimetools for detection. 2010-10-03 08:47:48 +02:00
Bastian Kleineidam
840538d12a Remove uneeded check for HTML content. 2010-09-29 19:25:14 +02:00
Bastian Kleineidam
279a1eae70 Only add geoip info for non-empty hostnames. 2010-09-29 15:59:57 +02:00
Bastian Kleineidam
cc848cdb33 Fix import for moved geoip module. 2010-09-29 15:17:27 +02:00
Bastian Kleineidam
8a1ac26c85 Warn about obfuscated IP numbers. 2010-09-05 20:11:02 +02:00
Bastian Kleineidam
8a074aeea9 Work around Python 2.6+ urljoin bug. 2010-08-31 09:16:24 +02:00
Bastian Kleineidam
c3b8ff00b3 Check content and recursion in one try/except to avoid multiple errors when getting page content. 2010-08-31 06:52:08 +02:00
Bastian Kleineidam
1faedafb33 Fix data size for HTTP requests. 2010-08-04 00:06:25 +02:00
Bastian Kleineidam
0f92b76290 Remove the unnormed URL warning. 2010-07-29 20:20:59 +02:00
Bastian Kleineidam
7ad4f7c220 Compare size from meta info and content data. 2010-07-29 19:53:41 +02:00
Bastian Kleineidam
d9bfd25a68 Add warning if content size is zero 2010-07-28 08:19:55 +02:00
Bluebird75
28f4514b67 Use object with __slots__ for wire-format of UrlBase objects.
Saves memory since UrlBase wire-format objects are used for
logging and thus often created.

Signed-off-by: Bastian Kleineidam <calvin@debian.org>
2010-03-27 00:07:19 +01:00