Bastian Kleineidam
|
6fac69cddb
|
Fall back to GET when connection is reset.
|
2010-11-21 19:50:51 +01:00 |
|
Bastian Kleineidam
|
147bf31e1e
|
Check for allowed HTTP GET method before parsing anchors in HTML file contents.
|
2010-11-17 19:13:26 +01:00 |
|
Bastian Kleineidam
|
4f5c957e43
|
Fix check of external domain after HTTP redirect.
|
2010-11-06 18:00:49 +01:00 |
|
Bastian Kleineidam
|
23b20306e9
|
Remove duplicate HTTP response codes.
|
2010-11-01 09:27:53 +01:00 |
|
Bastian Kleineidam
|
c5f93a561d
|
Fix debug message formatting.
|
2010-11-01 05:59:04 +01:00 |
|
Bastian Kleineidam
|
f14340a0a8
|
Do not check content of already cached URLs.
|
2010-10-27 19:52:48 +02:00 |
|
Bastian Kleineidam
|
1f81124dfa
|
Fix typo.
|
2010-10-27 19:23:14 +02:00 |
|
Bastian Kleineidam
|
23403f09bb
|
Do not print warning for HTTP to HTTPS or HTTPS to HTTP redirects.
|
2010-10-27 14:44:05 +02:00 |
|
Bastian Kleineidam
|
b2cf40151f
|
Improved redirection warning text.
|
2010-10-27 09:15:46 +02:00 |
|
Bastian Kleineidam
|
d9e981e497
|
Don't log a warning if commandline URL has been redirected.
|
2010-10-26 16:24:27 +02:00 |
|
Bastian Kleineidam
|
4375d35328
|
Add warning about unsupported HTTP authentication, and revert the realm changes.
|
2010-10-25 22:41:31 +02:00 |
|
Bastian Kleineidam
|
2a7292845c
|
Improved info message about sent cookies; do not report the retrieved cookie information.
|
2010-10-13 22:32:50 +02:00 |
|
Bastian Kleineidam
|
a8aa3bdb00
|
Another fix to ensure get_content() is only called when allowed.
|
2010-10-13 22:14:43 +02:00 |
|
Bastian Kleineidam
|
61e611e4bf
|
Prevent unallowed content read when checking for robots.txt allowance in HTML files.
|
2010-10-12 00:40:34 +02:00 |
|
Bastian Kleineidam
|
e494d6bbb6
|
Move MIME type detection into fileutil.py module, and use mimetools for detection.
|
2010-10-03 08:47:48 +02:00 |
|
Bastian Kleineidam
|
e0f4097eb0
|
Ensure HttpUrl.set_title_from_content() is only called when the content is allowed to be retrieved.
|
2010-09-29 19:26:03 +02:00 |
|
Bastian Kleineidam
|
5284017d67
|
Only fallback to HTTP GET when robots.txt sallows it.
|
2010-09-04 18:09:59 +02:00 |
|
Bastian Kleineidam
|
60f7af4598
|
Allow redirections to external URLs with same domain.
|
2010-08-13 01:22:18 +02:00 |
|
Bastian Kleineidam
|
1faedafb33
|
Fix data size for HTTP requests.
|
2010-08-04 00:06:25 +02:00 |
|
Bastian Kleineidam
|
7ad4f7c220
|
Compare size from meta info and content data.
|
2010-07-29 19:53:41 +02:00 |
|
Bastian Kleineidam
|
7536472797
|
Send correct host header when using http proxy.
|
2010-07-29 06:50:35 +02:00 |
|
Bastian Kleineidam
|
3370ea1562
|
Reflect changes in httplib2.py: use buffered read in httplib response object and use bad status line exception attribute.
|
2010-03-26 20:50:38 +01:00 |
|
Bastian Kleineidam
|
b8b0398dd2
|
Ensure redirected URL is Unicode encoded.
|
2010-03-07 22:11:55 +01:00 |
|
Bastian Kleineidam
|
c8e6995ecd
|
Support HTTPS proxies.
|
2010-03-07 21:06:10 +01:00 |
|
Bastian Kleineidam
|
6a2fcf8ae9
|
Parse links in Word files.
|
2010-03-07 19:20:51 +01:00 |
|
Bastian Kleineidam
|
3d5c114f14
|
Warn on permament redirections even when URL is outside of domain filter.
|
2010-03-07 09:36:21 +01:00 |
|
Bastian Kleineidam
|
2d73b907f1
|
Retry HTTP when server sent empty status line; should fix most of the BadStatusLine errors that are sporadically encountered.
|
2010-03-06 10:23:34 +01:00 |
|
Bastian Kleineidam
|
5e06b6b8d4
|
Updated FSF address in GPL blurb
|
2009-07-24 23:58:20 +02:00 |
|
Bastian Kleineidam
|
7f67027abf
|
ignore the fragment part (ie. the anchor) of URIs when
+ getting and caching content
|
2009-06-26 07:22:36 +02:00 |
|
Bastian Kleineidam
|
897b68ae9b
|
Fix copying of httpurl info
|
2009-03-07 00:17:17 +01:00 |
|
Bastian Kleineidam
|
29adfe92fd
|
Minor syntax fix
|
2009-03-06 20:14:50 +01:00 |
|
Bastian Kleineidam
|
6024f2e43e
|
Add missing reset of self.reused_connection flag
|
2009-03-06 20:10:03 +01:00 |
|
Bastian Kleineidam
|
58925b21d3
|
Improved persistent connection handling by retrying closed connections.
|
2009-03-06 08:15:34 +01:00 |
|
Bastian Kleineidam
|
29599e4c74
|
Make sure persistent connection will not close after reading contents.
|
2009-03-05 19:15:44 +01:00 |
|
Bastian Kleineidam
|
bf9ed8c659
|
Make sure file descriptors are closed after decoding HTTP content.
|
2009-03-05 19:15:03 +01:00 |
|
Bastian Kleineidam
|
7862147ca3
|
Fix showing content size.
|
2009-03-01 23:04:48 +01:00 |
|
calvin
|
e9805dbd8a
|
Updated copyright year to 2009
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3887 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2009-01-08 14:18:03 +00:00 |
|
calvin
|
209d5abc18
|
fix timeouts by testing earlier for persistent connections with HEAD
HEAD requests never have a body; nevertheless the http lib tries to
read() from them. This times out on some servers of course. Fix is
not to let those connections be persistent.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3871 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-11-29 08:14:28 +00:00 |
|
calvin
|
c20e706761
|
Made some format changes on translated strings.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3870 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-11-28 20:22:48 +00:00 |
|
calvin
|
c3b6fc5aa4
|
Readd
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3867 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-11-20 21:30:10 +00:00 |
|
calvin
|
97cf700e04
|
Fixed wrong cookie debugging format line.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3849 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-07-13 12:51:56 +00:00 |
|
calvin
|
b30fb3b09c
|
Remove duplicate code in http checker.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3820 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-06-16 19:52:09 +00:00 |
|
calvin
|
caf8ba6297
|
Really allow parsing of XHTML files; I forgot some places to adjust the MIME checking.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3818 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-06-16 13:03:48 +00:00 |
|
calvin
|
a6deeeb8a5
|
Support parsing of HTML pages served with content type application/xhtml+xml
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3817 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-06-16 09:39:49 +00:00 |
|
calvin
|
a880939c40
|
Initialize variables in reset(), not in subsequent methods
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3796 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-06-08 09:27:13 +00:00 |
|
calvin
|
5f4d61e018
|
Use keyword arguments in translation strings.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3780 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-27 19:44:40 +00:00 |
|
calvin
|
bacb59597e
|
Use relative imports from Python 2.5
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3750 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-09 06:16:03 +00:00 |
|
calvin
|
92c74ece4d
|
Send HTTP Referer header to both http and https URLs
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3741 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-29 13:33:35 +00:00 |
|
calvin
|
5d8bdaaa1f
|
Use generators instead of lists where possible
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3739 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-28 00:26:02 +00:00 |
|
calvin
|
3eac1be9ab
|
Require and use Python 2.5
Use Python 2.5 features and get rid of old compat code. Also some
code cleanups have been made.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3737 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-27 11:39:21 +00:00 |
|