Commit graph

194 commits

Author SHA1 Message Date
Bastian Kleineidam
797024c69b Fix URL connection cache key. 2012-04-04 22:58:09 +02:00
Bastian Kleineidam
4feea986b4 Fix concatenation of multiple cookie values. 2012-03-31 08:51:58 +02:00
Bastian Kleineidam
da6d7b0eca Store cookies on redirect. 2012-03-31 08:37:18 +02:00
Bastian Kleineidam
b9b8e3f5b2 Honor the charset encoding of the Content-Type HTTP
header when parsing HTML.
2012-03-22 22:45:11 +01:00
Bastian Kleineidam
5e13a78f66 Fix non-ascii HTTP header debugging. 2012-03-09 11:54:18 +01:00
Bastian Kleineidam
3fcff8a4e5 Fix non-ascii HTTP header handling. 2012-03-09 11:14:18 +01:00
Bastian Kleineidam
24811ac7b0 Recheck extern status on HTTP redirects even if domain did not change. 2012-03-08 10:07:31 +01:00
Bastian Kleineidam
71f5ee42c8 Updated copyright. 2012-01-29 17:18:28 +01:00
Bastian Kleineidam
6e1e9148d8 Work around a squid bug resulting in not detecting broken links 2012-01-17 08:36:11 +01:00
Bastian Kleineidam
4c15fc6a8b Properly handle non-ASCII HTTP header values. 2012-01-14 11:01:09 +01:00
Bastian Kleineidam
cdf91a0321 Improve cookie info message and fix cookie test cases. 2011-08-04 18:34:56 +02:00
Bastian Kleineidam
48413de418 Display warning message for each cookie parsing error. 2011-08-03 19:27:36 +02:00
Bastian Kleineidam
c99b75899d Send multiple cookie values in one header. 2011-08-02 21:57:16 +02:00
Bastian Kleineidam
c70bd68ef1 Refactor sending of cookie data in client into separate function. 2011-08-02 20:45:26 +02:00
Bastian Kleineidam
51bcccfdfe Added new option --user-agent to set the User-Agent header. 2011-07-25 21:09:49 +02:00
Bastian Kleineidam
552c71a3ca Do not append a stray newline character when encoding authentication information to base64. 2011-07-25 20:02:01 +02:00
Bastian Kleineidam
5515645af6 Reset content type setting after loading HTTP headers. 2011-05-28 17:59:44 +02:00
Bastian Kleineidam
03feaeca91 Correct warning about unparsable cookies. 2011-05-18 20:56:31 +02:00
Bastian Kleineidam
10bbb696e8 Limit download file size to 5MB. 2011-05-05 21:10:55 +02:00
Bastian Kleineidam
1f9cd2f67f Redirection refactoring part 2 of 2. 2011-04-27 13:33:01 +02:00
Bastian Kleineidam
dd53c78096 Redirection refactoring part 1. 2011-04-27 12:02:30 +02:00
Bastian Kleineidam
f566f98fe5 Allow redirections for URLs given by the user. 2011-04-27 11:21:58 +02:00
Bastian Kleineidam
6a544f2d69 Only allow redirections to FTP, HTTP and HTTPS URLs. 2011-04-19 07:01:55 +02:00
Bastian Kleineidam
de5d1757f0 Add workaround for buggy IIS HEAD support. 2011-02-24 11:12:59 +01:00
Bastian Kleineidam
2dfe62afa2 Updated copyright. 2011-02-14 21:07:07 +01:00
Bastian Kleineidam
c5884b8d87 Add function documentation. 2011-02-14 21:06:34 +01:00
Bastian Kleineidam
fd3fe8dcaa Fix missing content types for cached URLs. 2010-12-23 07:37:36 +01:00
Bastian Kleineidam
7c55351511 Add get_content_type methods to subclasses. 2010-12-15 07:54:44 +01:00
Bastian Kleineidam
01184784ef Remove warning about Unicode domains which are more widely supported now. 2010-12-11 07:58:15 +01:00
Bastian Kleineidam
6fac69cddb Fall back to GET when connection is reset. 2010-11-21 19:50:51 +01:00
Bastian Kleineidam
147bf31e1e Check for allowed HTTP GET method before parsing anchors in HTML file contents. 2010-11-17 19:13:26 +01:00
Bastian Kleineidam
4f5c957e43 Fix check of external domain after HTTP redirect. 2010-11-06 18:00:49 +01:00
Bastian Kleineidam
23b20306e9 Remove duplicate HTTP response codes. 2010-11-01 09:27:53 +01:00
Bastian Kleineidam
c5f93a561d Fix debug message formatting. 2010-11-01 05:59:04 +01:00
Bastian Kleineidam
f14340a0a8 Do not check content of already cached URLs. 2010-10-27 19:52:48 +02:00
Bastian Kleineidam
1f81124dfa Fix typo. 2010-10-27 19:23:14 +02:00
Bastian Kleineidam
23403f09bb Do not print warning for HTTP to HTTPS or HTTPS to HTTP redirects. 2010-10-27 14:44:05 +02:00
Bastian Kleineidam
b2cf40151f Improved redirection warning text. 2010-10-27 09:15:46 +02:00
Bastian Kleineidam
d9e981e497 Don't log a warning if commandline URL has been redirected. 2010-10-26 16:24:27 +02:00
Bastian Kleineidam
4375d35328 Add warning about unsupported HTTP authentication, and revert the realm changes. 2010-10-25 22:41:31 +02:00
Bastian Kleineidam
2a7292845c Improved info message about sent cookies; do not report the retrieved cookie information. 2010-10-13 22:32:50 +02:00
Bastian Kleineidam
a8aa3bdb00 Another fix to ensure get_content() is only called when allowed. 2010-10-13 22:14:43 +02:00
Bastian Kleineidam
61e611e4bf Prevent unallowed content read when checking for robots.txt allowance in HTML files. 2010-10-12 00:40:34 +02:00
Bastian Kleineidam
e494d6bbb6 Move MIME type detection into fileutil.py module, and use mimetools for detection. 2010-10-03 08:47:48 +02:00
Bastian Kleineidam
e0f4097eb0 Ensure HttpUrl.set_title_from_content() is only called when the content is allowed to be retrieved. 2010-09-29 19:26:03 +02:00
Bastian Kleineidam
5284017d67 Only fallback to HTTP GET when robots.txt sallows it. 2010-09-04 18:09:59 +02:00
Bastian Kleineidam
60f7af4598 Allow redirections to external URLs with same domain. 2010-08-13 01:22:18 +02:00
Bastian Kleineidam
1faedafb33 Fix data size for HTTP requests. 2010-08-04 00:06:25 +02:00
Bastian Kleineidam
7ad4f7c220 Compare size from meta info and content data. 2010-07-29 19:53:41 +02:00
Bastian Kleineidam
7536472797 Send correct host header when using http proxy. 2010-07-29 06:50:35 +02:00