Commit graph

771 commits

Author SHA1 Message Date
Bastian Kleineidam
6915e2f989 Detect sites not supporting HEAD requests. 2012-08-14 18:43:39 +02:00
Bastian Kleineidam
f3b66b102d Fallback to GET when method HEAD is not allowed. 2012-08-13 07:07:21 +02:00
Bastian Kleineidam
e65b5c72ce Correct list of schemes requiring host name. 2012-08-12 14:21:56 +02:00
Bastian Kleineidam
7b567cc378 Make scheme and domain for internal url pattern case insensitive. 2012-08-12 14:19:42 +02:00
Bastian Kleineidam
afc0ecd7a6 --ignore-url now really ignores URLs. 2012-08-12 11:16:29 +02:00
Bastian Kleineidam
6be3e9ddff Cleanup code and improve redirect anchor handling. 2012-08-12 11:14:56 +02:00
Bastian Kleineidam
c74690a79a Do not check SSl certificates on HTTPS -> HTTP redirects. 2012-08-10 19:43:57 +02:00
Bastian Kleineidam
b0e5c7fc59 Ignore feed: URLs. 2012-06-27 21:32:03 +02:00
Bastian Kleineidam
0fd1a78378 Always compare encoded anchor names. 2012-06-27 20:59:53 +02:00
Bastian Kleineidam
5c045fef44 Fix UNC path handling on Windows. 2012-06-24 10:30:54 +02:00
Bastian Kleineidam
31519f6a01 Fix handling of UNC pathnames. 2012-06-23 14:30:58 +02:00
Bastian Kleineidam
73b176d7c9 Fix URL joining: properly detect absolute URL. 2012-06-23 13:33:27 +02:00
Bastian Kleineidam
8d23e2a3c6 Add debugging for checker class name. 2012-06-23 13:30:13 +02:00
Bastian Kleineidam
dbe57c0f9b Treat Windows UNC paths as absolute paths. 2012-06-22 23:42:37 +02:00
Bastian Kleineidam
713b9ebada Only assume local file links for URLs given on the command line. 2012-06-22 23:42:05 +02:00
Bastian Kleineidam
9d0cced73c Fix SSL check errors. 2012-06-22 07:37:37 +02:00
Bastian Kleineidam
addbcfc54f Updated translation. 2012-06-20 20:18:39 +02:00
Bastian Kleineidam
4cce99a77d Test SSL certificate expiration. 2012-06-20 20:10:40 +02:00
Bastian Kleineidam
cbb13a8983 Add SSL certificate verification. 2012-06-18 23:05:44 +02:00
Bastian Kleineidam
f107092a8a Fix handling of user/password info in URLs. 2012-06-10 22:07:42 +02:00
Bastian Kleineidam
838095cbd5 Updated copyright. 2012-06-10 14:58:38 +02:00
Bastian Kleineidam
00aa631267 Add localwebroot configuration option. 2012-06-10 14:47:27 +02:00
Bastian Kleineidam
98537eea2f Code cleanup: use add_url() function in UrlBase. 2012-06-10 14:24:17 +02:00
Bastian Kleineidam
db95fce77e Ignore PHP processing instructions in local files. 2012-06-10 14:02:01 +02:00
Bastian Kleineidam
2dee223555 Allow memory dumps to be written. 2012-06-10 13:18:35 +02:00
Bastian Kleineidam
837ab22d01 Syntax cleanup. 2012-06-10 11:46:05 +02:00
Bastian Kleineidam
77b8ec0fcd Fix writing temporary Word files. 2012-06-10 11:07:35 +02:00
Bastian Kleineidam
54ffb102d8 Code cleanup: add function for GET fallback. 2012-06-10 09:52:12 +02:00
Bastian Kleineidam
5c94c47901 Remove old Squid proxy workaround. 2012-06-10 09:45:07 +02:00
Bastian Kleineidam
bcbacec79a Code cleanup. 2012-05-10 21:05:33 +02:00
Bastian Kleineidam
61138744e6 Always use GET for Zope servers. 2012-05-08 20:47:47 +02:00
Bastian Kleineidam
52dcf101e0 Remove rest of deprecated options. 2012-04-22 17:55:12 +02:00
Bastian Kleineidam
797024c69b Fix URL connection cache key. 2012-04-04 22:58:09 +02:00
Bastian Kleineidam
4feea986b4 Fix concatenation of multiple cookie values. 2012-03-31 08:51:58 +02:00
Bastian Kleineidam
da6d7b0eca Store cookies on redirect. 2012-03-31 08:37:18 +02:00
Bastian Kleineidam
6d5e5f9efb Updated copyright. 2012-03-30 22:24:10 +02:00
Bastian Kleineidam
b9b8e3f5b2 Honor the charset encoding of the Content-Type HTTP
header when parsing HTML.
2012-03-22 22:45:11 +01:00
Bastian Kleineidam
98b4768419 Use timeout when checking email addresses with SMTP. 2012-03-16 21:44:18 +01:00
Bastian Kleineidam
4c9fd8d488 Cache real url. 2012-03-14 21:12:13 +01:00
Bastian Kleineidam
5e13a78f66 Fix non-ascii HTTP header debugging. 2012-03-09 11:54:18 +01:00
Bastian Kleineidam
3fcff8a4e5 Fix non-ascii HTTP header handling. 2012-03-09 11:14:18 +01:00
Bastian Kleineidam
24811ac7b0 Recheck extern status on HTTP redirects even if domain did not change. 2012-03-08 10:07:31 +01:00
Bastian Kleineidam
71f5ee42c8 Updated copyright. 2012-01-29 17:18:28 +01:00
Bastian Kleineidam
042b0569ec Fall back to W3C checkers. 2012-01-22 08:13:27 +01:00
Bastian Kleineidam
51cf55b7a6 Remove warning: prefix from warning messages. 2012-01-21 00:25:02 +01:00
Bastian Kleineidam
6e1e9148d8 Work around a squid bug resulting in not detecting broken links 2012-01-17 08:36:11 +01:00
Bastian Kleineidam
e99c55f6c4 Proper proxy type check. 2012-01-16 21:15:53 +01:00
Bastian Kleineidam
4c15fc6a8b Properly handle non-ASCII HTTP header values. 2012-01-14 11:01:09 +01:00
Bastian Kleineidam
a0581cc2a1 Ignore steam:// URIs. 2012-01-10 19:37:19 +01:00
Bastian Kleineidam
f1eb51d885 Updated copyright 2012-01-06 09:21:30 +01:00
Bastian Kleineidam
033280cfb9 Remove workarounds for old Python versions. 2012-01-04 20:17:53 +01:00
Bastian Kleineidam
3d9958dfbb Parse Safari bookmark files. 2011-12-17 16:38:25 +01:00
Bastian Kleineidam
a2978209e6 Ignore errors trying to get FTP feature set. 2011-10-18 13:10:49 +02:00
Bastian Kleineidam
27b7b1cb49 Fix W3C HTML validation. 2011-10-09 21:16:45 +02:00
Bastian Kleineidam
89ec0ee6a1 Check multiple matches of warning regex. 2011-10-09 19:00:35 +02:00
Bastian Kleineidam
09d9264470 Updated copyright. 2011-08-04 20:40:49 +02:00
Bastian Kleineidam
cdf91a0321 Improve cookie info message and fix cookie test cases. 2011-08-04 18:34:56 +02:00
Bastian Kleineidam
48413de418 Display warning message for each cookie parsing error. 2011-08-03 19:27:36 +02:00
Bastian Kleineidam
c99b75899d Send multiple cookie values in one header. 2011-08-02 21:57:16 +02:00
Bastian Kleineidam
c70bd68ef1 Refactor sending of cookie data in client into separate function. 2011-08-02 20:45:26 +02:00
Bastian Kleineidam
51bcccfdfe Added new option --user-agent to set the User-Agent header. 2011-07-25 21:09:49 +02:00
Bastian Kleineidam
552c71a3ca Do not append a stray newline character when encoding authentication information to base64. 2011-07-25 20:02:01 +02:00
Bastian Kleineidam
2550e16040 Remove query part from file links. 2011-05-29 17:49:01 +02:00
Bastian Kleineidam
5515645af6 Reset content type setting after loading HTTP headers. 2011-05-28 17:59:44 +02:00
Bastian Kleineidam
0f70438a87 Updated copyright. 2011-05-28 08:44:21 +02:00
Bastian Kleineidam
684a9b5bf6 Add includes to dns.rdtypes.IN/ANY in setup.py, not in mailtourl.py module. 2011-05-25 21:03:10 +02:00
Bastian Kleineidam
e1f724908d Move dnspython module into third_party directory. 2011-05-24 20:18:58 +02:00
Bastian Kleineidam
72b65d94df Only check anchors in HTML pages. 2011-05-22 17:33:16 +02:00
Bastian Kleineidam
e5c2271533 Only check warning patterns in parseable contents. 2011-05-22 17:32:26 +02:00
Bastian Kleineidam
68ea03ee16 Support both Chromium and Google Chrome profile dirs to find bookmark files. 2011-05-21 11:47:54 +02:00
Bastian Kleineidam
78790d7c8d Improved anchor warning message display. 2011-05-20 06:48:06 +02:00
Bastian Kleineidam
03feaeca91 Correct warning about unparsable cookies. 2011-05-18 20:56:31 +02:00
Bastian Kleineidam
343cf9703d Code cleanup: indentation, unused variables etc. 2011-05-15 18:36:30 +02:00
Bastian Kleineidam
a1f0867c74 Updated copyright 2011-05-06 20:27:36 +02:00
Bastian Kleineidam
10bbb696e8 Limit download file size to 5MB. 2011-05-05 21:10:55 +02:00
Bastian Kleineidam
1f9cd2f67f Redirection refactoring part 2 of 2. 2011-04-27 13:33:01 +02:00
Bastian Kleineidam
dd53c78096 Redirection refactoring part 1. 2011-04-27 12:02:30 +02:00
Bastian Kleineidam
f566f98fe5 Allow redirections for URLs given by the user. 2011-04-27 11:21:58 +02:00
Bastian Kleineidam
db7ea6872a Refactor internal URL pattern matcher into function. 2011-04-27 08:34:15 +02:00
Bastian Kleineidam
719441cca5 Make module detection more robust and use it when possible. 2011-04-20 09:08:11 +02:00
Bastian Kleineidam
6a544f2d69 Only allow redirections to FTP, HTTP and HTTPS URLs. 2011-04-19 07:01:55 +02:00
Bastian Kleineidam
84f6d56a49 Print level in loggers xml, csv and sql. 2011-04-09 10:51:03 +02:00
Bastian Kleineidam
c0732e3d37 Do not print empty country information. 2011-04-06 17:22:48 +02:00
Bastian Kleineidam
82e5ba8ce6 Add warning tag attribute in XML loggers. 2011-03-15 13:42:21 +01:00
Bastian Kleineidam
f4f921384e Updated copyright 2011-03-13 07:52:18 +01:00
Bastian Kleineidam
502430489a Add url2pathname workaround for Windows. 2011-03-12 16:33:48 +01:00
Bastian Kleineidam
7b33cfac7b Use stripped URL base constructing absolute URL. 2011-03-11 15:17:36 +01:00
Bastian Kleineidam
78ea8d5594 Remove unnecessary call to url2pathname(). 2011-03-11 12:28:33 +01:00
Bastian Kleineidam
ae109ed994 Correct conversion between URL and filename paths. 2011-03-11 10:38:17 +01:00
Bastian Kleineidam
420c21c2de Strip leading and trailing whitespace from URLs. 2011-03-07 12:33:09 +01:00
Bastian Kleineidam
21e4824f65 Fix typo calling get_temp_file() function. 2011-03-07 09:57:40 +01:00
Bastian Kleineidam
de5d1757f0 Add workaround for buggy IIS HEAD support. 2011-02-24 11:12:59 +01:00
Bastian Kleineidam
c89bd05651 Remove unused variables and imports. 2011-02-19 11:46:20 +01:00
Bastian Kleineidam
2ec312301e Fix linkcheck.dns py2exe packaging. 2011-02-18 17:26:00 +01:00
Bastian Kleineidam
0d4377d1ba Support Google Chrome Bookmark files. 2011-02-15 18:26:00 +01:00
Bastian Kleineidam
25b6dc2e57 Refactor bookmark parsing code into own package. 2011-02-15 17:31:42 +01:00
Bastian Kleineidam
2dfe62afa2 Updated copyright. 2011-02-14 21:07:07 +01:00
Bastian Kleineidam
c5884b8d87 Add function documentation. 2011-02-14 21:06:34 +01:00
Bastian Kleineidam
85f3690068 Updated copyright. 2011-02-11 14:00:31 +01:00
Bastian Kleineidam
db6a3669b3 Correctly detect empty FTP paths as directories. 2011-02-11 12:35:53 +01:00
Bastian Kleineidam
362c7a1d9d Preselect filename on save dialog when editing file:// URLs. 2011-02-09 08:46:09 +01:00
Bastian Kleineidam
4a0c63aa56 Fix joining of URLs when parent URL has CGI parameter. 2011-02-08 21:25:55 +01:00
Bastian Kleineidam
71b15b70f4 Updated copyright 2011-01-06 09:59:57 +01:00
Bastian Kleineidam
5f70b7210f Add tempfile utility function. 2011-01-06 09:52:11 +01:00
Bastian Kleineidam
d011d1524c Parse PHP files recursively. 2010-12-28 17:11:29 +01:00
Bastian Kleineidam
fd3fe8dcaa Fix missing content types for cached URLs. 2010-12-23 07:37:36 +01:00
Bastian Kleineidam
2a4b60de4d Remove unused imports. 2010-12-22 13:06:24 +01:00
Bastian Kleineidam
84e4e3b28a Fix regression from last commit in this file. 2010-12-22 13:06:10 +01:00
Bastian Kleineidam
0d8a583e39 Fix internal pattern for file URLs (regression from commit 90e0f4e) 2010-12-21 21:10:31 +01:00
Bastian Kleineidam
6090e1a66c Print anchor in __str__() 2010-12-21 20:55:49 +01:00
Bastian Kleineidam
1ebd4d1fc4 Simplify code. 2010-12-21 20:55:35 +01:00
Bastian Kleineidam
90e0f4e5cc Detect filenames with spaces as internal links. 2010-12-21 07:05:12 +01:00
Bastian Kleineidam
9ea35241c0 Set correct scheme on file links. 2010-12-21 01:23:50 +01:00
Bastian Kleineidam
128f8eb6e4 Move firefox routines to firefox module. 2010-12-21 00:02:12 +01:00
Bastian Kleineidam
7c08290c44 Fix broken anchor checking. 2010-12-20 19:55:26 +01:00
Bastian Kleineidam
0b8f8d52b2 Check for empty URL before determining content type. 2010-12-18 08:26:59 +01:00
Bastian Kleineidam
224061e284 Fix to_wire by looking of URL parts have been initialized. 2010-12-15 13:24:12 +01:00
Bastian Kleineidam
7c55351511 Add get_content_type methods to subclasses. 2010-12-15 07:54:44 +01:00
Bastian Kleineidam
2b2121b9ed Added content type and domain to URL logging info. 2010-12-14 20:30:53 +01:00
Bastian Kleineidam
01184784ef Remove warning about Unicode domains which are more widely supported now. 2010-12-11 07:58:15 +01:00
Bastian Kleineidam
9e88377584 Remove stray raise statement from previous commit. 2010-11-26 21:35:49 +01:00
Bastian Kleineidam
c5676f0297 Catch socket errors when closing SMTP connections. 2010-11-26 19:51:26 +01:00
Bastian Kleineidam
5c9c15071a Limit FTP download file size. 2010-11-25 20:44:41 +01:00
Bastian Kleineidam
0cf22e5242 Limit FTP download file size. 2010-11-25 20:44:14 +01:00
Bastian Kleineidam
6fac69cddb Fall back to GET when connection is reset. 2010-11-21 19:50:51 +01:00
Bastian Kleineidam
03034ddc1c Updated copyright 2010-11-21 11:25:07 +01:00
Bastian Kleineidam
04f9c1b854 Use urlparse.parse_qs() instead of cgi.parse_qs() 2010-11-21 10:43:47 +01:00
Bastian Kleineidam
147bf31e1e Check for allowed HTTP GET method before parsing anchors in HTML file contents. 2010-11-17 19:13:26 +01:00
Bastian Kleineidam
17ce930611 Ignore irc:// URLs. 2010-11-10 19:56:31 +01:00
Bastian Kleineidam
2fde5bea8c Updated copyright 2010-11-06 18:02:56 +01:00
Bastian Kleineidam
4f5c957e43 Fix check of external domain after HTTP redirect. 2010-11-06 18:00:49 +01:00
Bastian Kleineidam
57ffa6bf97 Allow both redirection www.example.com -> example.com and vice versa. 2010-11-06 17:55:49 +01:00
Bastian Kleineidam
280b7892ef Remove unused NNTP warning. 2010-11-06 17:39:22 +01:00
Bastian Kleineidam
1188e0be2e Retry NNTP connections on temporary errors. 2010-11-06 17:26:40 +01:00
Bastian Kleineidam
23b20306e9 Remove duplicate HTTP response codes. 2010-11-01 09:27:53 +01:00
Bastian Kleineidam
c5f93a561d Fix debug message formatting. 2010-11-01 05:59:04 +01:00
Bastian Kleineidam
f14340a0a8 Do not check content of already cached URLs. 2010-10-27 19:52:48 +02:00
Bastian Kleineidam
1f81124dfa Fix typo. 2010-10-27 19:23:14 +02:00
Bastian Kleineidam
23403f09bb Do not print warning for HTTP to HTTPS or HTTPS to HTTP redirects. 2010-10-27 14:44:05 +02:00
Bastian Kleineidam
b2cf40151f Improved redirection warning text. 2010-10-27 09:15:46 +02:00
Bastian Kleineidam
d9e981e497 Don't log a warning if commandline URL has been redirected. 2010-10-26 16:24:27 +02:00
Bastian Kleineidam
4375d35328 Add warning about unsupported HTTP authentication, and revert the realm changes. 2010-10-25 22:41:31 +02:00
Bastian Kleineidam
332fa4f8f9 Prepare multi-realm auth configuration. 2010-10-25 22:07:16 +02:00
Bastian Kleineidam
2a7292845c Improved info message about sent cookies; do not report the retrieved cookie information. 2010-10-13 22:32:50 +02:00
Bastian Kleineidam
a8aa3bdb00 Another fix to ensure get_content() is only called when allowed. 2010-10-13 22:14:43 +02:00
Bastian Kleineidam
61e611e4bf Prevent unallowed content read when checking for robots.txt allowance in HTML files. 2010-10-12 00:40:34 +02:00
Bastian Kleineidam
1d0db02192 Refactor getting user and password for an URL. 2010-10-11 20:11:15 +02:00
Bastian Kleineidam
e494d6bbb6 Move MIME type detection into fileutil.py module, and use mimetools for detection. 2010-10-03 08:47:48 +02:00
Bastian Kleineidam
e0f4097eb0 Ensure HttpUrl.set_title_from_content() is only called when the content is allowed to be retrieved. 2010-09-29 19:26:03 +02:00
Bastian Kleineidam
840538d12a Remove uneeded check for HTML content. 2010-09-29 19:25:14 +02:00