Commit graph

165 commits

Author SHA1 Message Date
Bastian Kleineidam
6fac69cddb Fall back to GET when connection is reset. 2010-11-21 19:50:51 +01:00
Bastian Kleineidam
147bf31e1e Check for allowed HTTP GET method before parsing anchors in HTML file contents. 2010-11-17 19:13:26 +01:00
Bastian Kleineidam
4f5c957e43 Fix check of external domain after HTTP redirect. 2010-11-06 18:00:49 +01:00
Bastian Kleineidam
23b20306e9 Remove duplicate HTTP response codes. 2010-11-01 09:27:53 +01:00
Bastian Kleineidam
c5f93a561d Fix debug message formatting. 2010-11-01 05:59:04 +01:00
Bastian Kleineidam
f14340a0a8 Do not check content of already cached URLs. 2010-10-27 19:52:48 +02:00
Bastian Kleineidam
1f81124dfa Fix typo. 2010-10-27 19:23:14 +02:00
Bastian Kleineidam
23403f09bb Do not print warning for HTTP to HTTPS or HTTPS to HTTP redirects. 2010-10-27 14:44:05 +02:00
Bastian Kleineidam
b2cf40151f Improved redirection warning text. 2010-10-27 09:15:46 +02:00
Bastian Kleineidam
d9e981e497 Don't log a warning if commandline URL has been redirected. 2010-10-26 16:24:27 +02:00
Bastian Kleineidam
4375d35328 Add warning about unsupported HTTP authentication, and revert the realm changes. 2010-10-25 22:41:31 +02:00
Bastian Kleineidam
2a7292845c Improved info message about sent cookies; do not report the retrieved cookie information. 2010-10-13 22:32:50 +02:00
Bastian Kleineidam
a8aa3bdb00 Another fix to ensure get_content() is only called when allowed. 2010-10-13 22:14:43 +02:00
Bastian Kleineidam
61e611e4bf Prevent unallowed content read when checking for robots.txt allowance in HTML files. 2010-10-12 00:40:34 +02:00
Bastian Kleineidam
e494d6bbb6 Move MIME type detection into fileutil.py module, and use mimetools for detection. 2010-10-03 08:47:48 +02:00
Bastian Kleineidam
e0f4097eb0 Ensure HttpUrl.set_title_from_content() is only called when the content is allowed to be retrieved. 2010-09-29 19:26:03 +02:00
Bastian Kleineidam
5284017d67 Only fallback to HTTP GET when robots.txt sallows it. 2010-09-04 18:09:59 +02:00
Bastian Kleineidam
60f7af4598 Allow redirections to external URLs with same domain. 2010-08-13 01:22:18 +02:00
Bastian Kleineidam
1faedafb33 Fix data size for HTTP requests. 2010-08-04 00:06:25 +02:00
Bastian Kleineidam
7ad4f7c220 Compare size from meta info and content data. 2010-07-29 19:53:41 +02:00
Bastian Kleineidam
7536472797 Send correct host header when using http proxy. 2010-07-29 06:50:35 +02:00
Bastian Kleineidam
3370ea1562 Reflect changes in httplib2.py: use buffered read in httplib response object and use bad status line exception attribute. 2010-03-26 20:50:38 +01:00
Bastian Kleineidam
b8b0398dd2 Ensure redirected URL is Unicode encoded. 2010-03-07 22:11:55 +01:00
Bastian Kleineidam
c8e6995ecd Support HTTPS proxies. 2010-03-07 21:06:10 +01:00
Bastian Kleineidam
6a2fcf8ae9 Parse links in Word files. 2010-03-07 19:20:51 +01:00
Bastian Kleineidam
3d5c114f14 Warn on permament redirections even when URL is outside of domain filter. 2010-03-07 09:36:21 +01:00
Bastian Kleineidam
2d73b907f1 Retry HTTP when server sent empty status line; should fix most of the BadStatusLine errors that are sporadically encountered. 2010-03-06 10:23:34 +01:00
Bastian Kleineidam
5e06b6b8d4 Updated FSF address in GPL blurb 2009-07-24 23:58:20 +02:00
Bastian Kleineidam
7f67027abf ignore the fragment part (ie. the anchor) of URIs when
+  getting and caching content
2009-06-26 07:22:36 +02:00
Bastian Kleineidam
897b68ae9b Fix copying of httpurl info 2009-03-07 00:17:17 +01:00
Bastian Kleineidam
29adfe92fd Minor syntax fix 2009-03-06 20:14:50 +01:00
Bastian Kleineidam
6024f2e43e Add missing reset of self.reused_connection flag 2009-03-06 20:10:03 +01:00
Bastian Kleineidam
58925b21d3 Improved persistent connection handling by retrying closed connections. 2009-03-06 08:15:34 +01:00
Bastian Kleineidam
29599e4c74 Make sure persistent connection will not close after reading contents. 2009-03-05 19:15:44 +01:00
Bastian Kleineidam
bf9ed8c659 Make sure file descriptors are closed after decoding HTTP content. 2009-03-05 19:15:03 +01:00
Bastian Kleineidam
7862147ca3 Fix showing content size. 2009-03-01 23:04:48 +01:00
calvin
e9805dbd8a Updated copyright year to 2009
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3887 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-08 14:18:03 +00:00
calvin
209d5abc18 fix timeouts by testing earlier for persistent connections with HEAD
HEAD requests never have a body; nevertheless the http lib tries to
read() from them. This times out on some servers of course. Fix is
not to let those connections be persistent.

git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3871 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-29 08:14:28 +00:00
calvin
c20e706761 Made some format changes on translated strings.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3870 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-28 20:22:48 +00:00
calvin
c3b6fc5aa4 Readd
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3867 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-20 21:30:10 +00:00
calvin
97cf700e04 Fixed wrong cookie debugging format line.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3849 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-07-13 12:51:56 +00:00
calvin
b30fb3b09c Remove duplicate code in http checker.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3820 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-06-16 19:52:09 +00:00
calvin
caf8ba6297 Really allow parsing of XHTML files; I forgot some places to adjust the MIME checking.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3818 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-06-16 13:03:48 +00:00
calvin
a6deeeb8a5 Support parsing of HTML pages served with content type application/xhtml+xml
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3817 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-06-16 09:39:49 +00:00
calvin
a880939c40 Initialize variables in reset(), not in subsequent methods
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3796 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-06-08 09:27:13 +00:00
calvin
5f4d61e018 Use keyword arguments in translation strings.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3780 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-05-27 19:44:40 +00:00
calvin
bacb59597e Use relative imports from Python 2.5
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3750 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-05-09 06:16:03 +00:00
calvin
92c74ece4d Send HTTP Referer header to both http and https URLs
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3741 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-04-29 13:33:35 +00:00
calvin
5d8bdaaa1f Use generators instead of lists where possible
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3739 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-04-28 00:26:02 +00:00
calvin
3eac1be9ab Require and use Python 2.5
Use Python 2.5 features and get rid of old compat code. Also some
code cleanups have been made.


git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3737 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-04-27 11:39:21 +00:00