Commit graph

592 commits

Author SHA1 Message Date
Bastian Kleineidam
2fde5bea8c Updated copyright 2010-11-06 18:02:56 +01:00
Bastian Kleineidam
4f5c957e43 Fix check of external domain after HTTP redirect. 2010-11-06 18:00:49 +01:00
Bastian Kleineidam
57ffa6bf97 Allow both redirection www.example.com -> example.com and vice versa. 2010-11-06 17:55:49 +01:00
Bastian Kleineidam
280b7892ef Remove unused NNTP warning. 2010-11-06 17:39:22 +01:00
Bastian Kleineidam
1188e0be2e Retry NNTP connections on temporary errors. 2010-11-06 17:26:40 +01:00
Bastian Kleineidam
23b20306e9 Remove duplicate HTTP response codes. 2010-11-01 09:27:53 +01:00
Bastian Kleineidam
c5f93a561d Fix debug message formatting. 2010-11-01 05:59:04 +01:00
Bastian Kleineidam
f14340a0a8 Do not check content of already cached URLs. 2010-10-27 19:52:48 +02:00
Bastian Kleineidam
1f81124dfa Fix typo. 2010-10-27 19:23:14 +02:00
Bastian Kleineidam
23403f09bb Do not print warning for HTTP to HTTPS or HTTPS to HTTP redirects. 2010-10-27 14:44:05 +02:00
Bastian Kleineidam
b2cf40151f Improved redirection warning text. 2010-10-27 09:15:46 +02:00
Bastian Kleineidam
d9e981e497 Don't log a warning if commandline URL has been redirected. 2010-10-26 16:24:27 +02:00
Bastian Kleineidam
4375d35328 Add warning about unsupported HTTP authentication, and revert the realm changes. 2010-10-25 22:41:31 +02:00
Bastian Kleineidam
332fa4f8f9 Prepare multi-realm auth configuration. 2010-10-25 22:07:16 +02:00
Bastian Kleineidam
2a7292845c Improved info message about sent cookies; do not report the retrieved cookie information. 2010-10-13 22:32:50 +02:00
Bastian Kleineidam
a8aa3bdb00 Another fix to ensure get_content() is only called when allowed. 2010-10-13 22:14:43 +02:00
Bastian Kleineidam
61e611e4bf Prevent unallowed content read when checking for robots.txt allowance in HTML files. 2010-10-12 00:40:34 +02:00
Bastian Kleineidam
1d0db02192 Refactor getting user and password for an URL. 2010-10-11 20:11:15 +02:00
Bastian Kleineidam
e494d6bbb6 Move MIME type detection into fileutil.py module, and use mimetools for detection. 2010-10-03 08:47:48 +02:00
Bastian Kleineidam
e0f4097eb0 Ensure HttpUrl.set_title_from_content() is only called when the content is allowed to be retrieved. 2010-09-29 19:26:03 +02:00
Bastian Kleineidam
840538d12a Remove uneeded check for HTML content. 2010-09-29 19:25:14 +02:00
Bastian Kleineidam
279a1eae70 Only add geoip info for non-empty hostnames. 2010-09-29 15:59:57 +02:00
Bastian Kleineidam
cc848cdb33 Fix import for moved geoip module. 2010-09-29 15:17:27 +02:00
Bastian Kleineidam
ffcd274087 Updated copyright 2010-09-05 21:02:51 +02:00
Bastian Kleineidam
8a1ac26c85 Warn about obfuscated IP numbers. 2010-09-05 20:11:02 +02:00
Bastian Kleineidam
5284017d67 Only fallback to HTTP GET when robots.txt sallows it. 2010-09-04 18:09:59 +02:00
Bastian Kleineidam
8a074aeea9 Work around Python 2.6+ urljoin bug. 2010-08-31 09:16:24 +02:00
Bastian Kleineidam
c3b8ff00b3 Check content and recursion in one try/except to avoid multiple errors when getting page content. 2010-08-31 06:52:08 +02:00
Bastian Kleineidam
60f7af4598 Allow redirections to external URLs with same domain. 2010-08-13 01:22:18 +02:00
Bastian Kleineidam
1faedafb33 Fix data size for HTTP requests. 2010-08-04 00:06:25 +02:00
Bastian Kleineidam
c086f49cea Catch KeyError when quoting URLs of index.html. 2010-07-30 20:12:52 +02:00
Bastian Kleineidam
4678802a81 Do not truncate UNC filepaths 2010-07-30 20:07:11 +02:00
Bastian Kleineidam
761b292e37 Added skype: to list of recognized but ignored URL schemes. 2010-07-29 20:26:04 +02:00
Bastian Kleineidam
0f92b76290 Remove the unnormed URL warning. 2010-07-29 20:20:59 +02:00
Bastian Kleineidam
7ad4f7c220 Compare size from meta info and content data. 2010-07-29 19:53:41 +02:00
Bastian Kleineidam
8413b427e9 Rename some warnings, and add size unequality warning. 2010-07-29 19:53:15 +02:00
Bastian Kleineidam
7536472797 Send correct host header when using http proxy. 2010-07-29 06:50:35 +02:00
Bastian Kleineidam
41e2e1a448 Add new warning to warning list. 2010-07-28 13:47:58 +02:00
Bastian Kleineidam
d9bfd25a68 Add warning if content size is zero 2010-07-28 08:19:55 +02:00
Bluebird75
28f4514b67 Use object with __slots__ for wire-format of UrlBase objects.
Saves memory since UrlBase wire-format objects are used for
logging and thus often created.

Signed-off-by: Bastian Kleineidam <calvin@debian.org>
2010-03-27 00:07:19 +01:00
Bastian Kleineidam
3370ea1562 Reflect changes in httplib2.py: use buffered read in httplib response object and use bad status line exception attribute. 2010-03-26 20:50:38 +01:00
Bastian Kleineidam
c4c098bd83 pep8-ify the source a little more 2010-03-13 08:47:12 +01:00
Bastian Kleineidam
37b4e97012 Revert "Only parse anchors if both --anchors option is given and the current link has an anchor."
This reverts commit b238527d54.
2010-03-10 00:04:02 +01:00
Bastian Kleineidam
b238527d54 Only parse anchors if both --anchors option is given and the current link has an anchor. 2010-03-09 11:45:50 +01:00
Bastian Kleineidam
57397e938b Improved linkname parsing by adding a new peek() HTML parser function. 2010-03-09 11:31:12 +01:00
Bastian Kleineidam
074b5ded32 Support UTF-8 encoded filenames in FTP servers. 2010-03-09 08:15:29 +01:00
Bastian Kleineidam
c88791b815 Fix support for non-standard FTP ports. 2010-03-09 07:49:05 +01:00
Bastian Kleineidam
51a0ef0ad4 Speed up HTML parsing by stopping early and adding callbacks. 2010-03-08 09:04:33 +01:00
Bastian Kleineidam
b8b0398dd2 Ensure redirected URL is Unicode encoded. 2010-03-07 22:11:55 +01:00
Bastian Kleineidam
c8e6995ecd Support HTTPS proxies. 2010-03-07 21:06:10 +01:00
Bastian Kleineidam
1e15e55689 Fix errors in Word file parsing. 2010-03-07 19:43:08 +01:00
Bastian Kleineidam
6a2fcf8ae9 Parse links in Word files. 2010-03-07 19:20:51 +01:00
Bastian Kleineidam
34a2f4a15d Disable and deprecated the --no-proxy-for option. 2010-03-07 17:45:48 +01:00
Bastian Kleineidam
796cf0a7cd Updated copyright year 2010-03-07 11:59:18 +01:00
Bastian Kleineidam
af6cb287d7 Only warn about missing emails in mailto: URLs. 2010-03-07 10:43:29 +01:00
Bastian Kleineidam
3d5c114f14 Warn on permament redirections even when URL is outside of domain filter. 2010-03-07 09:36:21 +01:00
Bastian Kleineidam
2d73b907f1 Retry HTTP when server sent empty status line; should fix most of the BadStatusLine errors that are sporadically encountered. 2010-03-06 10:23:34 +01:00
Bastian Kleineidam
77daf80e82 Add url encoding parameter 2009-11-28 11:56:35 +01:00
Bastian Kleineidam
5e06b6b8d4 Updated FSF address in GPL blurb 2009-07-24 23:58:20 +02:00
Bastian Kleineidam
e6f43b6822 Fixed the no_proxy handling and added changelog entry 2009-07-24 07:19:49 +02:00
Bastian Kleineidam
7f67027abf ignore the fragment part (ie. the anchor) of URIs when
+  getting and caching content
2009-06-26 07:22:36 +02:00
Bastian Kleineidam
c7b7af877f Read Mozilla bookmark titles correctly from places.sqlite. 2009-05-20 07:50:46 +02:00
Bastian Kleineidam
59ffbd43f0 Use AttrDict for transport object in loggers. 2009-03-07 09:43:55 +01:00
Bastian Kleineidam
7a59763508 Remove unused SetList container 2009-03-07 00:42:27 +01:00
Bastian Kleineidam
2351506752 Use plain list for info strings. 2009-03-07 00:19:19 +01:00
Bastian Kleineidam
897b68ae9b Fix copying of httpurl info 2009-03-07 00:17:17 +01:00
Bastian Kleineidam
88dbcb30cd Remove unused url_data.info tags - the tags were always None 2009-03-06 21:20:09 +01:00
Bastian Kleineidam
0b5f525f76 Print NNTP server welcome string as info 2009-03-06 20:57:35 +01:00
Bastian Kleineidam
4ee0fb0181 Add NNTP debugging. 2009-03-06 20:53:12 +01:00
Bastian Kleineidam
0bc2fbb47a Only try 3 times connecting to a busy NNTP server, not 5 times. 2009-03-06 20:52:53 +01:00
Bastian Kleineidam
29adfe92fd Minor syntax fix 2009-03-06 20:14:50 +01:00
Bastian Kleineidam
6024f2e43e Add missing reset of self.reused_connection flag 2009-03-06 20:10:03 +01:00
Bastian Kleineidam
ba160350dd Introduced transport object API for logging. 2009-03-06 19:30:58 +01:00
Bastian Kleineidam
58925b21d3 Improved persistent connection handling by retrying closed connections. 2009-03-06 08:15:34 +01:00
Bastian Kleineidam
29599e4c74 Make sure persistent connection will not close after reading contents. 2009-03-05 19:15:44 +01:00
Bastian Kleineidam
bf9ed8c659 Make sure file descriptors are closed after decoding HTTP content. 2009-03-05 19:15:03 +01:00
Bastian Kleineidam
b8944e493a Use new exception log keyword when logging errors 2009-03-02 13:18:36 +01:00
Bastian Kleineidam
a9335fb3e8 Make file list an iterator, and add missing slash if needed to manually given file URLs. 2009-03-02 08:02:27 +01:00
Bastian Kleineidam
7862147ca3 Fix showing content size. 2009-03-01 23:04:48 +01:00
Bastian Kleineidam
8caa601a7e Python 3.0 compatibility: use exc.args[] instead of exc[] 2009-02-24 12:41:45 +01:00
Bastian Kleineidam
2c9b8d6858 Use slash as path separator in file names 2009-02-24 12:41:28 +01:00
Bastian Kleineidam
323958951c Add name to unnamed file URLs. 2009-02-20 14:03:34 +01:00
calvin
2e918a7b7a Added email syntax check.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3960 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-02-18 15:35:23 +00:00
calvin
7214943f38 Remove wrong function return type documentation
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3959 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-02-18 15:34:46 +00:00
calvin
7e5a2ea23b Remove unused file
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3930 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-24 17:35:06 +00:00
calvin
e03df9e709 Removed gopher URL checking.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3929 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-24 17:34:18 +00:00
calvin
c6cb09c4aa Add missing import
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3900 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-10 19:41:42 +00:00
calvin
1c50cf288a Ignore DNS MX lookup failures in py2exe.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3899 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-10 18:50:14 +00:00
calvin
cc25deac12 Only accept MX dns response types when asking for MX servers.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3895 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-10 17:53:10 +00:00
calvin
979132c9b5 Catch all DNS exceptions when resolving MX hosts.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3894 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-10 15:13:55 +00:00
calvin
a26ca4c23a Replace C ftpparse module with Python implementation
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3892 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-10 14:11:17 +00:00
calvin
e9805dbd8a Updated copyright year to 2009
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3887 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-08 14:18:03 +00:00
calvin
8d5d4827c3 Change ftpparse import to avoid py2exe load error.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3883 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-08 12:28:39 +00:00
calvin
209d5abc18 fix timeouts by testing earlier for persistent connections with HEAD
HEAD requests never have a body; nevertheless the http lib tries to
read() from them. This times out on some servers of course. Fix is
not to let those connections be persistent.

git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3871 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-29 08:14:28 +00:00
calvin
c20e706761 Made some format changes on translated strings.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3870 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-28 20:22:48 +00:00
calvin
1abc2c504d Filter invalid mozilla bookmark urls from places.sqlite
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3869 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-28 10:54:16 +00:00
calvin
c3b6fc5aa4 Readd
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3867 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-20 21:30:10 +00:00
calvin
42c3e71329 Improved and tested opera bookmark parser
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3863 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-20 07:52:02 +00:00
calvin
9ab895751f Support parsing of Firefox 3 bookmark files
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3862 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-11-20 07:51:22 +00:00
calvin
97cf700e04 Fixed wrong cookie debugging format line.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3849 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-07-13 12:51:56 +00:00