Bastian Kleineidam
|
6915e2f989
|
Detect sites not supporting HEAD requests.
|
2012-08-14 18:43:39 +02:00 |
|
Bastian Kleineidam
|
f3b66b102d
|
Fallback to GET when method HEAD is not allowed.
|
2012-08-13 07:07:21 +02:00 |
|
Bastian Kleineidam
|
e65b5c72ce
|
Correct list of schemes requiring host name.
|
2012-08-12 14:21:56 +02:00 |
|
Bastian Kleineidam
|
7b567cc378
|
Make scheme and domain for internal url pattern case insensitive.
|
2012-08-12 14:19:42 +02:00 |
|
Bastian Kleineidam
|
afc0ecd7a6
|
--ignore-url now really ignores URLs.
|
2012-08-12 11:16:29 +02:00 |
|
Bastian Kleineidam
|
6be3e9ddff
|
Cleanup code and improve redirect anchor handling.
|
2012-08-12 11:14:56 +02:00 |
|
Bastian Kleineidam
|
c74690a79a
|
Do not check SSl certificates on HTTPS -> HTTP redirects.
|
2012-08-10 19:43:57 +02:00 |
|
Bastian Kleineidam
|
b0e5c7fc59
|
Ignore feed: URLs.
|
2012-06-27 21:32:03 +02:00 |
|
Bastian Kleineidam
|
0fd1a78378
|
Always compare encoded anchor names.
|
2012-06-27 20:59:53 +02:00 |
|
Bastian Kleineidam
|
5c045fef44
|
Fix UNC path handling on Windows.
|
2012-06-24 10:30:54 +02:00 |
|
Bastian Kleineidam
|
31519f6a01
|
Fix handling of UNC pathnames.
|
2012-06-23 14:30:58 +02:00 |
|
Bastian Kleineidam
|
73b176d7c9
|
Fix URL joining: properly detect absolute URL.
|
2012-06-23 13:33:27 +02:00 |
|
Bastian Kleineidam
|
8d23e2a3c6
|
Add debugging for checker class name.
|
2012-06-23 13:30:13 +02:00 |
|
Bastian Kleineidam
|
dbe57c0f9b
|
Treat Windows UNC paths as absolute paths.
|
2012-06-22 23:42:37 +02:00 |
|
Bastian Kleineidam
|
713b9ebada
|
Only assume local file links for URLs given on the command line.
|
2012-06-22 23:42:05 +02:00 |
|
Bastian Kleineidam
|
9d0cced73c
|
Fix SSL check errors.
|
2012-06-22 07:37:37 +02:00 |
|
Bastian Kleineidam
|
addbcfc54f
|
Updated translation.
|
2012-06-20 20:18:39 +02:00 |
|
Bastian Kleineidam
|
4cce99a77d
|
Test SSL certificate expiration.
|
2012-06-20 20:10:40 +02:00 |
|
Bastian Kleineidam
|
cbb13a8983
|
Add SSL certificate verification.
|
2012-06-18 23:05:44 +02:00 |
|
Bastian Kleineidam
|
f107092a8a
|
Fix handling of user/password info in URLs.
|
2012-06-10 22:07:42 +02:00 |
|
Bastian Kleineidam
|
838095cbd5
|
Updated copyright.
|
2012-06-10 14:58:38 +02:00 |
|
Bastian Kleineidam
|
00aa631267
|
Add localwebroot configuration option.
|
2012-06-10 14:47:27 +02:00 |
|
Bastian Kleineidam
|
98537eea2f
|
Code cleanup: use add_url() function in UrlBase.
|
2012-06-10 14:24:17 +02:00 |
|
Bastian Kleineidam
|
db95fce77e
|
Ignore PHP processing instructions in local files.
|
2012-06-10 14:02:01 +02:00 |
|
Bastian Kleineidam
|
2dee223555
|
Allow memory dumps to be written.
|
2012-06-10 13:18:35 +02:00 |
|
Bastian Kleineidam
|
837ab22d01
|
Syntax cleanup.
|
2012-06-10 11:46:05 +02:00 |
|
Bastian Kleineidam
|
77b8ec0fcd
|
Fix writing temporary Word files.
|
2012-06-10 11:07:35 +02:00 |
|
Bastian Kleineidam
|
54ffb102d8
|
Code cleanup: add function for GET fallback.
|
2012-06-10 09:52:12 +02:00 |
|
Bastian Kleineidam
|
5c94c47901
|
Remove old Squid proxy workaround.
|
2012-06-10 09:45:07 +02:00 |
|
Bastian Kleineidam
|
bcbacec79a
|
Code cleanup.
|
2012-05-10 21:05:33 +02:00 |
|
Bastian Kleineidam
|
61138744e6
|
Always use GET for Zope servers.
|
2012-05-08 20:47:47 +02:00 |
|
Bastian Kleineidam
|
52dcf101e0
|
Remove rest of deprecated options.
|
2012-04-22 17:55:12 +02:00 |
|
Bastian Kleineidam
|
797024c69b
|
Fix URL connection cache key.
|
2012-04-04 22:58:09 +02:00 |
|
Bastian Kleineidam
|
4feea986b4
|
Fix concatenation of multiple cookie values.
|
2012-03-31 08:51:58 +02:00 |
|
Bastian Kleineidam
|
da6d7b0eca
|
Store cookies on redirect.
|
2012-03-31 08:37:18 +02:00 |
|
Bastian Kleineidam
|
6d5e5f9efb
|
Updated copyright.
|
2012-03-30 22:24:10 +02:00 |
|
Bastian Kleineidam
|
b9b8e3f5b2
|
Honor the charset encoding of the Content-Type HTTP
header when parsing HTML.
|
2012-03-22 22:45:11 +01:00 |
|
Bastian Kleineidam
|
98b4768419
|
Use timeout when checking email addresses with SMTP.
|
2012-03-16 21:44:18 +01:00 |
|
Bastian Kleineidam
|
4c9fd8d488
|
Cache real url.
|
2012-03-14 21:12:13 +01:00 |
|
Bastian Kleineidam
|
5e13a78f66
|
Fix non-ascii HTTP header debugging.
|
2012-03-09 11:54:18 +01:00 |
|
Bastian Kleineidam
|
3fcff8a4e5
|
Fix non-ascii HTTP header handling.
|
2012-03-09 11:14:18 +01:00 |
|
Bastian Kleineidam
|
24811ac7b0
|
Recheck extern status on HTTP redirects even if domain did not change.
|
2012-03-08 10:07:31 +01:00 |
|
Bastian Kleineidam
|
71f5ee42c8
|
Updated copyright.
|
2012-01-29 17:18:28 +01:00 |
|
Bastian Kleineidam
|
042b0569ec
|
Fall back to W3C checkers.
|
2012-01-22 08:13:27 +01:00 |
|
Bastian Kleineidam
|
51cf55b7a6
|
Remove warning: prefix from warning messages.
|
2012-01-21 00:25:02 +01:00 |
|
Bastian Kleineidam
|
6e1e9148d8
|
Work around a squid bug resulting in not detecting broken links
|
2012-01-17 08:36:11 +01:00 |
|
Bastian Kleineidam
|
e99c55f6c4
|
Proper proxy type check.
|
2012-01-16 21:15:53 +01:00 |
|
Bastian Kleineidam
|
4c15fc6a8b
|
Properly handle non-ASCII HTTP header values.
|
2012-01-14 11:01:09 +01:00 |
|
Bastian Kleineidam
|
a0581cc2a1
|
Ignore steam:// URIs.
|
2012-01-10 19:37:19 +01:00 |
|
Bastian Kleineidam
|
f1eb51d885
|
Updated copyright
|
2012-01-06 09:21:30 +01:00 |
|
Bastian Kleineidam
|
033280cfb9
|
Remove workarounds for old Python versions.
|
2012-01-04 20:17:53 +01:00 |
|
Bastian Kleineidam
|
3d9958dfbb
|
Parse Safari bookmark files.
|
2011-12-17 16:38:25 +01:00 |
|
Bastian Kleineidam
|
a2978209e6
|
Ignore errors trying to get FTP feature set.
|
2011-10-18 13:10:49 +02:00 |
|
Bastian Kleineidam
|
27b7b1cb49
|
Fix W3C HTML validation.
|
2011-10-09 21:16:45 +02:00 |
|
Bastian Kleineidam
|
89ec0ee6a1
|
Check multiple matches of warning regex.
|
2011-10-09 19:00:35 +02:00 |
|
Bastian Kleineidam
|
09d9264470
|
Updated copyright.
|
2011-08-04 20:40:49 +02:00 |
|
Bastian Kleineidam
|
cdf91a0321
|
Improve cookie info message and fix cookie test cases.
|
2011-08-04 18:34:56 +02:00 |
|
Bastian Kleineidam
|
48413de418
|
Display warning message for each cookie parsing error.
|
2011-08-03 19:27:36 +02:00 |
|
Bastian Kleineidam
|
c99b75899d
|
Send multiple cookie values in one header.
|
2011-08-02 21:57:16 +02:00 |
|
Bastian Kleineidam
|
c70bd68ef1
|
Refactor sending of cookie data in client into separate function.
|
2011-08-02 20:45:26 +02:00 |
|
Bastian Kleineidam
|
51bcccfdfe
|
Added new option --user-agent to set the User-Agent header.
|
2011-07-25 21:09:49 +02:00 |
|
Bastian Kleineidam
|
552c71a3ca
|
Do not append a stray newline character when encoding authentication information to base64.
|
2011-07-25 20:02:01 +02:00 |
|
Bastian Kleineidam
|
2550e16040
|
Remove query part from file links.
|
2011-05-29 17:49:01 +02:00 |
|
Bastian Kleineidam
|
5515645af6
|
Reset content type setting after loading HTTP headers.
|
2011-05-28 17:59:44 +02:00 |
|
Bastian Kleineidam
|
0f70438a87
|
Updated copyright.
|
2011-05-28 08:44:21 +02:00 |
|
Bastian Kleineidam
|
684a9b5bf6
|
Add includes to dns.rdtypes.IN/ANY in setup.py, not in mailtourl.py module.
|
2011-05-25 21:03:10 +02:00 |
|
Bastian Kleineidam
|
e1f724908d
|
Move dnspython module into third_party directory.
|
2011-05-24 20:18:58 +02:00 |
|
Bastian Kleineidam
|
72b65d94df
|
Only check anchors in HTML pages.
|
2011-05-22 17:33:16 +02:00 |
|
Bastian Kleineidam
|
e5c2271533
|
Only check warning patterns in parseable contents.
|
2011-05-22 17:32:26 +02:00 |
|
Bastian Kleineidam
|
68ea03ee16
|
Support both Chromium and Google Chrome profile dirs to find bookmark files.
|
2011-05-21 11:47:54 +02:00 |
|
Bastian Kleineidam
|
78790d7c8d
|
Improved anchor warning message display.
|
2011-05-20 06:48:06 +02:00 |
|
Bastian Kleineidam
|
03feaeca91
|
Correct warning about unparsable cookies.
|
2011-05-18 20:56:31 +02:00 |
|
Bastian Kleineidam
|
343cf9703d
|
Code cleanup: indentation, unused variables etc.
|
2011-05-15 18:36:30 +02:00 |
|
Bastian Kleineidam
|
a1f0867c74
|
Updated copyright
|
2011-05-06 20:27:36 +02:00 |
|
Bastian Kleineidam
|
10bbb696e8
|
Limit download file size to 5MB.
|
2011-05-05 21:10:55 +02:00 |
|
Bastian Kleineidam
|
1f9cd2f67f
|
Redirection refactoring part 2 of 2.
|
2011-04-27 13:33:01 +02:00 |
|
Bastian Kleineidam
|
dd53c78096
|
Redirection refactoring part 1.
|
2011-04-27 12:02:30 +02:00 |
|
Bastian Kleineidam
|
f566f98fe5
|
Allow redirections for URLs given by the user.
|
2011-04-27 11:21:58 +02:00 |
|
Bastian Kleineidam
|
db7ea6872a
|
Refactor internal URL pattern matcher into function.
|
2011-04-27 08:34:15 +02:00 |
|
Bastian Kleineidam
|
719441cca5
|
Make module detection more robust and use it when possible.
|
2011-04-20 09:08:11 +02:00 |
|
Bastian Kleineidam
|
6a544f2d69
|
Only allow redirections to FTP, HTTP and HTTPS URLs.
|
2011-04-19 07:01:55 +02:00 |
|
Bastian Kleineidam
|
84f6d56a49
|
Print level in loggers xml, csv and sql.
|
2011-04-09 10:51:03 +02:00 |
|
Bastian Kleineidam
|
c0732e3d37
|
Do not print empty country information.
|
2011-04-06 17:22:48 +02:00 |
|
Bastian Kleineidam
|
82e5ba8ce6
|
Add warning tag attribute in XML loggers.
|
2011-03-15 13:42:21 +01:00 |
|
Bastian Kleineidam
|
f4f921384e
|
Updated copyright
|
2011-03-13 07:52:18 +01:00 |
|
Bastian Kleineidam
|
502430489a
|
Add url2pathname workaround for Windows.
|
2011-03-12 16:33:48 +01:00 |
|
Bastian Kleineidam
|
7b33cfac7b
|
Use stripped URL base constructing absolute URL.
|
2011-03-11 15:17:36 +01:00 |
|
Bastian Kleineidam
|
78ea8d5594
|
Remove unnecessary call to url2pathname().
|
2011-03-11 12:28:33 +01:00 |
|
Bastian Kleineidam
|
ae109ed994
|
Correct conversion between URL and filename paths.
|
2011-03-11 10:38:17 +01:00 |
|
Bastian Kleineidam
|
420c21c2de
|
Strip leading and trailing whitespace from URLs.
|
2011-03-07 12:33:09 +01:00 |
|
Bastian Kleineidam
|
21e4824f65
|
Fix typo calling get_temp_file() function.
|
2011-03-07 09:57:40 +01:00 |
|
Bastian Kleineidam
|
de5d1757f0
|
Add workaround for buggy IIS HEAD support.
|
2011-02-24 11:12:59 +01:00 |
|
Bastian Kleineidam
|
c89bd05651
|
Remove unused variables and imports.
|
2011-02-19 11:46:20 +01:00 |
|
Bastian Kleineidam
|
2ec312301e
|
Fix linkcheck.dns py2exe packaging.
|
2011-02-18 17:26:00 +01:00 |
|
Bastian Kleineidam
|
0d4377d1ba
|
Support Google Chrome Bookmark files.
|
2011-02-15 18:26:00 +01:00 |
|
Bastian Kleineidam
|
25b6dc2e57
|
Refactor bookmark parsing code into own package.
|
2011-02-15 17:31:42 +01:00 |
|
Bastian Kleineidam
|
2dfe62afa2
|
Updated copyright.
|
2011-02-14 21:07:07 +01:00 |
|
Bastian Kleineidam
|
c5884b8d87
|
Add function documentation.
|
2011-02-14 21:06:34 +01:00 |
|
Bastian Kleineidam
|
85f3690068
|
Updated copyright.
|
2011-02-11 14:00:31 +01:00 |
|
Bastian Kleineidam
|
db6a3669b3
|
Correctly detect empty FTP paths as directories.
|
2011-02-11 12:35:53 +01:00 |
|
Bastian Kleineidam
|
362c7a1d9d
|
Preselect filename on save dialog when editing file:// URLs.
|
2011-02-09 08:46:09 +01:00 |
|
Bastian Kleineidam
|
4a0c63aa56
|
Fix joining of URLs when parent URL has CGI parameter.
|
2011-02-08 21:25:55 +01:00 |
|
Bastian Kleineidam
|
71b15b70f4
|
Updated copyright
|
2011-01-06 09:59:57 +01:00 |
|
Bastian Kleineidam
|
5f70b7210f
|
Add tempfile utility function.
|
2011-01-06 09:52:11 +01:00 |
|
Bastian Kleineidam
|
d011d1524c
|
Parse PHP files recursively.
|
2010-12-28 17:11:29 +01:00 |
|
Bastian Kleineidam
|
fd3fe8dcaa
|
Fix missing content types for cached URLs.
|
2010-12-23 07:37:36 +01:00 |
|
Bastian Kleineidam
|
2a4b60de4d
|
Remove unused imports.
|
2010-12-22 13:06:24 +01:00 |
|
Bastian Kleineidam
|
84e4e3b28a
|
Fix regression from last commit in this file.
|
2010-12-22 13:06:10 +01:00 |
|
Bastian Kleineidam
|
0d8a583e39
|
Fix internal pattern for file URLs (regression from commit 90e0f4e)
|
2010-12-21 21:10:31 +01:00 |
|
Bastian Kleineidam
|
6090e1a66c
|
Print anchor in __str__()
|
2010-12-21 20:55:49 +01:00 |
|
Bastian Kleineidam
|
1ebd4d1fc4
|
Simplify code.
|
2010-12-21 20:55:35 +01:00 |
|
Bastian Kleineidam
|
90e0f4e5cc
|
Detect filenames with spaces as internal links.
|
2010-12-21 07:05:12 +01:00 |
|
Bastian Kleineidam
|
9ea35241c0
|
Set correct scheme on file links.
|
2010-12-21 01:23:50 +01:00 |
|
Bastian Kleineidam
|
128f8eb6e4
|
Move firefox routines to firefox module.
|
2010-12-21 00:02:12 +01:00 |
|
Bastian Kleineidam
|
7c08290c44
|
Fix broken anchor checking.
|
2010-12-20 19:55:26 +01:00 |
|
Bastian Kleineidam
|
0b8f8d52b2
|
Check for empty URL before determining content type.
|
2010-12-18 08:26:59 +01:00 |
|
Bastian Kleineidam
|
224061e284
|
Fix to_wire by looking of URL parts have been initialized.
|
2010-12-15 13:24:12 +01:00 |
|
Bastian Kleineidam
|
7c55351511
|
Add get_content_type methods to subclasses.
|
2010-12-15 07:54:44 +01:00 |
|
Bastian Kleineidam
|
2b2121b9ed
|
Added content type and domain to URL logging info.
|
2010-12-14 20:30:53 +01:00 |
|
Bastian Kleineidam
|
01184784ef
|
Remove warning about Unicode domains which are more widely supported now.
|
2010-12-11 07:58:15 +01:00 |
|
Bastian Kleineidam
|
9e88377584
|
Remove stray raise statement from previous commit.
|
2010-11-26 21:35:49 +01:00 |
|
Bastian Kleineidam
|
c5676f0297
|
Catch socket errors when closing SMTP connections.
|
2010-11-26 19:51:26 +01:00 |
|
Bastian Kleineidam
|
5c9c15071a
|
Limit FTP download file size.
|
2010-11-25 20:44:41 +01:00 |
|
Bastian Kleineidam
|
0cf22e5242
|
Limit FTP download file size.
|
2010-11-25 20:44:14 +01:00 |
|
Bastian Kleineidam
|
6fac69cddb
|
Fall back to GET when connection is reset.
|
2010-11-21 19:50:51 +01:00 |
|
Bastian Kleineidam
|
03034ddc1c
|
Updated copyright
|
2010-11-21 11:25:07 +01:00 |
|
Bastian Kleineidam
|
04f9c1b854
|
Use urlparse.parse_qs() instead of cgi.parse_qs()
|
2010-11-21 10:43:47 +01:00 |
|
Bastian Kleineidam
|
147bf31e1e
|
Check for allowed HTTP GET method before parsing anchors in HTML file contents.
|
2010-11-17 19:13:26 +01:00 |
|
Bastian Kleineidam
|
17ce930611
|
Ignore irc:// URLs.
|
2010-11-10 19:56:31 +01:00 |
|
Bastian Kleineidam
|
2fde5bea8c
|
Updated copyright
|
2010-11-06 18:02:56 +01:00 |
|
Bastian Kleineidam
|
4f5c957e43
|
Fix check of external domain after HTTP redirect.
|
2010-11-06 18:00:49 +01:00 |
|
Bastian Kleineidam
|
57ffa6bf97
|
Allow both redirection www.example.com -> example.com and vice versa.
|
2010-11-06 17:55:49 +01:00 |
|
Bastian Kleineidam
|
280b7892ef
|
Remove unused NNTP warning.
|
2010-11-06 17:39:22 +01:00 |
|
Bastian Kleineidam
|
1188e0be2e
|
Retry NNTP connections on temporary errors.
|
2010-11-06 17:26:40 +01:00 |
|
Bastian Kleineidam
|
23b20306e9
|
Remove duplicate HTTP response codes.
|
2010-11-01 09:27:53 +01:00 |
|
Bastian Kleineidam
|
c5f93a561d
|
Fix debug message formatting.
|
2010-11-01 05:59:04 +01:00 |
|
Bastian Kleineidam
|
f14340a0a8
|
Do not check content of already cached URLs.
|
2010-10-27 19:52:48 +02:00 |
|
Bastian Kleineidam
|
1f81124dfa
|
Fix typo.
|
2010-10-27 19:23:14 +02:00 |
|
Bastian Kleineidam
|
23403f09bb
|
Do not print warning for HTTP to HTTPS or HTTPS to HTTP redirects.
|
2010-10-27 14:44:05 +02:00 |
|
Bastian Kleineidam
|
b2cf40151f
|
Improved redirection warning text.
|
2010-10-27 09:15:46 +02:00 |
|
Bastian Kleineidam
|
d9e981e497
|
Don't log a warning if commandline URL has been redirected.
|
2010-10-26 16:24:27 +02:00 |
|
Bastian Kleineidam
|
4375d35328
|
Add warning about unsupported HTTP authentication, and revert the realm changes.
|
2010-10-25 22:41:31 +02:00 |
|
Bastian Kleineidam
|
332fa4f8f9
|
Prepare multi-realm auth configuration.
|
2010-10-25 22:07:16 +02:00 |
|
Bastian Kleineidam
|
2a7292845c
|
Improved info message about sent cookies; do not report the retrieved cookie information.
|
2010-10-13 22:32:50 +02:00 |
|
Bastian Kleineidam
|
a8aa3bdb00
|
Another fix to ensure get_content() is only called when allowed.
|
2010-10-13 22:14:43 +02:00 |
|
Bastian Kleineidam
|
61e611e4bf
|
Prevent unallowed content read when checking for robots.txt allowance in HTML files.
|
2010-10-12 00:40:34 +02:00 |
|
Bastian Kleineidam
|
1d0db02192
|
Refactor getting user and password for an URL.
|
2010-10-11 20:11:15 +02:00 |
|
Bastian Kleineidam
|
e494d6bbb6
|
Move MIME type detection into fileutil.py module, and use mimetools for detection.
|
2010-10-03 08:47:48 +02:00 |
|
Bastian Kleineidam
|
e0f4097eb0
|
Ensure HttpUrl.set_title_from_content() is only called when the content is allowed to be retrieved.
|
2010-09-29 19:26:03 +02:00 |
|
Bastian Kleineidam
|
840538d12a
|
Remove uneeded check for HTML content.
|
2010-09-29 19:25:14 +02:00 |
|