fix tag parsing

git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@180 e7d03fd6-7b0d-0410-9947-9c21f3af8025
This commit is contained in:
calvin 2000-10-28 16:15:56 +00:00
parent 5a39f03b60
commit f2bd10e31b
3 changed files with 22 additions and 20 deletions

4
FAQ
View file

@ -1,11 +1,11 @@
Q: The link "mailto:john@company.com?subject=Hello John" is reported
as an error.
A: You have to quote special characters (e.g. spaces) in the subject field.
The correct link should be "mailto:...?subject=Hello%20John!"
The correct link should be "mailto:...?subject=Hello%20John"
Unfortunately browsers like IE and Netscape do not enforce this.
Q: I have a pretty large site to check. How can I restrict link checking
to only check my own pages?
to check only my own pages?
A: Look at the options --intern, --extern, --strict and --recursion-level.

6
debian/changelog vendored
View file

@ -3,11 +3,13 @@ linkchecker (1.2.6) unstable; urgency=low
* made a FAQ
* configuration changes: distutils are now required; because of that
we have no more .tmpl files
* correct db name in create.sql
* fix db name in create.sql
* added timeoutsocket.py to supply a timeout for socket.connect()
calls
* fix tag parsing when a quoted tag attribute value contains a >
character
-- Bastian Kleineidam <calvin@users.sourceforge.net> Mon, 16 Oct 2000 14:55:51 +0200
-- Bastian Kleineidam <calvin@users.sourceforge.net> Sat, 28 Oct 2000 17:52:53 +0200
linkchecker (1.2.5) unstable; urgency=low

View file

@ -33,22 +33,22 @@ except ImportError:
pass
_linkMatcher = r"""
(?i) # case insensitive
< # open tag
\s* # whitespace
%s # tag name
\s+ # whitespace
[^>]*? # skip leading attributes
%s # attrib name
\s* # whitespace
= # equal sign
\s* # whitespace
(?P<value> # attribute value
".*?" | # in double quotes
'.*?' | # in single quotes
[^\s>]+) # unquoted
[^>]* # skip trailing attributes
> # close tag
(?i) # case insensitive
< # open tag
\s* # whitespace
%s # tag name
\s+ # whitespace
[^>]*? # skip leading attributes
%s # attrib name
\s* # whitespace
= # equal sign
\s* # whitespace
(?P<value> # attribute value
".*?" | # in double quotes
'.*?' | # in single quotes
[^\s>]+) # unquoted
([^">]|".*?")* # skip trailing attributes
> # close tag
"""
LinkPatterns = (