Catch UnicodeError for invalid host names.

This commit is contained in:
Bastian Kleineidam 2013-01-23 19:42:29 +01:00
parent c0a0efbd1d
commit e6ad32c028
5 changed files with 23 additions and 7 deletions

View file

@ -3,9 +3,9 @@
Features:
- checking: Support <link rel="dns-prefetch"> URLs.
- logging: Sending SIGUSR1 signal prints the stack trace of all current
running threads. This makes it easier to debug deadlocks.
- gui: Added support of Drag-and-Drop of local files. If the local file is
a LinkChecker project (.lcp) it is loaded automatically, else the check
running threads. This makes debugging deadlocks easier.
- gui: Support Drag-and-Drop of local files. If the local file is
a LinkChecker project (.lcp) file it is loaded, else the check
URL is set to the local file URL.
Changes:
@ -14,6 +14,8 @@ Changes:
Fixes:
- checking: Fix a crash when closing a Word document after scanning failed.
Closes: GH bug #369
- checking: Catch UnicodeError from idna.encode() fixing an internal error when
trying to connect to certain invalid hostnames.
8.3 "Mahna Mahna Killer" (released 6.1.2013)

View file

@ -53,6 +53,8 @@ ExcCacheList = [
ftplib.error_temp,
ftplib.error_perm,
ftplib.error_proto,
# idna.encode(), called from socket.create_connection()
UnicodeError,
]
# Exceptions that do not put the URL in the cache so that the URL can

View file

@ -526,14 +526,17 @@ class UrlBase (object):
self.check_connection()
self.add_size_info()
self.add_country_info()
except tuple(ExcList):
except tuple(ExcList) as exc:
value = self.handle_exception()
# make nicer error msg for unknown hosts
if isinstance(value, socket.error) and value.args[0] == -2:
if isinstance(exc, socket.error) and exc.args[0] == -2:
value = _('Hostname not found')
# make nicer error msg for bad status line
if isinstance(value, httplib.BadStatusLine):
elif isinstance(exc, httplib.BadStatusLine):
value = _('Bad HTTP response %(line)r') % {"line": str(value)}
elif isinstance(exc, UnicodeError):
# idna.encode(host) failed
value = _('Bad hostname %(host)r: %(msg)s') % {'host': self.host, 'msg': str(value)}
self.set_result(unicode_safe(value), valid=False)
self.checktime = time.time() - check_start
if self.do_check_content:

View file

@ -21,3 +21,6 @@
<object classid="clsid:12345-67890" codebase="http://example.org/foo/ #a=1,2,3">
<!-- <a href=http://nocheckin> no check because of comment -->
<!-- throws UnicodeError from idna.encode() -->
<a href="http://.example.org/">UnicodeError</a>

View file

@ -1,7 +1,7 @@
url http://localhost:%(port)d/%(datadir)s/http.html
cache key http://localhost:%(port)d/%(datadir)s/http.html
real url http://localhost:%(port)d/%(datadir)s/http.html
info 13 URLs parsed.
info 14 URLs parsed.
valid
url dns://www.example.org
@ -76,3 +76,9 @@ info Redirected to `http://www.iana.org/domains/example'.
warning Anchor `a%%3D1%%2C2%%2C3' not found. Available anchors: -.
valid
url http://.example.org/
cache key http://.example.org/
real url http://.example.org/
name UnicodeError
warning Access denied by robots.txt, skipping content checks.
error