Commit graph

3187 commits

Author SHA1 Message Date
Chris Mayo
a6d6fa0cd4 Tidy linkchecker intro 2020-08-30 18:40:39 +01:00
Chris Mayo
37e4981089
Merge pull request #492 from cjmayo/pass
Assorted tidying included unneeded pass statements
2020-08-29 16:55:39 +01:00
Chris Mayo
7ef599fc20
Merge pull request #491 from cjmayo/sphinx2
Documentation Updates
2020-08-29 16:50:27 +01:00
Chris Mayo
1390c9cd7e
Merge pull request #489 from cjmayo/urlsplit
Replace deprecated urllib.parse.split functions
2020-08-29 16:44:56 +01:00
Chris Mayo
47604e7d34
Merge pull request #481 from cjmayo/failures
Rename blacklist to failures
2020-08-29 16:39:24 +01:00
Chris Mayo
7dfba766a9
Merge pull request #486 from cjmayo/url
Remove unused code from url.py
2020-08-26 19:28:50 +01:00
Chris Mayo
b1d19e5eab Update copyright and version 2020-08-23 17:24:09 +01:00
Chris Mayo
2de25d54fd Rename blacklist to failures
Continue to support blacklist for the time being, with deprecation
warnings.
2020-08-23 17:19:26 +01:00
Chris Mayo
dfa1ff05dc Backport tabs to spaces from better_exchook.py 2020-08-22 17:17:02 +01:00
Chris Mayo
2864962c13 Backport bare except changes from better_exchook.py 2020-08-22 17:17:02 +01:00
Chris Mayo
1f58419322 Remove unneeded pass statements 2020-08-22 17:17:02 +01:00
Chris Mayo
8779c39735 Replace deprecated urllib.parse.split functions 2020-08-22 16:28:53 +01:00
Chris Mayo
5a2eda9058
Merge pull request #488 from cjmayo/gschema
Avoid dependency on gsettings-desktop-schemas
2020-08-21 16:56:25 +01:00
Chris Mayo
1b497389b5
Merge pull request #483 from cjmayo/retryafter
Don't translate "Retry-After" server header field
2020-08-21 16:51:17 +01:00
Chris Mayo
4969b6dd0a
Merge pull request #482 from cjmayo/syntaxcheck
Fix CssSyntaxCheck list index out of range
2020-08-21 16:46:37 +01:00
Chris Mayo
e9db151145
Merge pull request #480 from cjmayo/blacklist
Fix blacklist updating
2020-08-20 19:48:59 +01:00
Chris Mayo
b869b8876f Avoid dependency on gsettings-desktop-schemas
Gio.Settings.new() causes LinkChecker to exit if the GNOME proxy schema
cannot be found.
2020-08-20 19:42:44 +01:00
Chris Mayo
cfe5c89eb6
Merge pull request #479 from cjmayo/versions
Add missing essential modules to internal error message
2020-08-20 19:36:45 +01:00
Chris Mayo
d7efa20d33 Remove unused constants from url.py 2020-08-19 19:27:28 +01:00
Chris Mayo
be24836c73 Remove unused url.url_unsplit() 2020-08-18 19:57:46 +01:00
Chris Mayo
d58b3ab285 Remove unused url.url_fix_common_typos() 2020-08-18 19:57:46 +01:00
Chris Mayo
9488e1eb41 Remove unused url.is_safe_x matches 2020-08-18 19:57:46 +01:00
Chris Mayo
71ea78382b Remove unused url.safe_host_pattern() 2020-08-18 19:57:46 +01:00
Chris Mayo
794efd6d44 Remove unused url.is_duplicate_content_url() 2020-08-18 19:57:46 +01:00
Chris Mayo
e372657fb8 Remove unused url.get_content() 2020-08-18 19:57:46 +01:00
Chris Mayo
e4ba9c84ce Remove unused url.match_{host,url}()
Removes deprecation warnings for urllib.parse.split{host,type}() in
url_split()
2020-08-18 19:57:46 +01:00
Chris Mayo
b32fe6f692
Merge pull request #478 from cjmayo/imp
Fix deprecation warning for use of the imp module
2020-08-18 19:56:40 +01:00
Chris Mayo
4ad20d7f03
Merge pull request #477 from cjmayo/sitemap
Detect sitemaps that do not start with an XML declaration
2020-08-18 19:51:32 +01:00
Chris Mayo
5d83e93829
Merge pull request #475 from cjmayo/iana
Update IANA scripts and ignored schemes
2020-08-18 19:40:35 +01:00
Chris Mayo
0086c28b3a
Merge pull request #474 from cjmayo/srcset
Fix problems with trailing commas and data: URIs in srcset values
2020-08-15 16:58:38 +01:00
Chris Mayo
0269fd88b0 Merge pull request #473 from cjmayo/valueerror
Fix critical exception when parsing a URL with a ]
2020-08-15 16:51:17 +01:00
Chris Mayo
88566ad20a
Merge pull request #472 from cjmayo/baseref
Fix CSV logger not recognising base part setting
2020-08-15 16:41:57 +01:00
Chris Mayo
525b6751a9 Merge pull request #468 from cjmayo/interrupter
Rename director/interrupt.py to director/interrupter.py
2020-08-15 16:31:33 +01:00
Chris Mayo
ccaa882d50
Merge pull request #471 from cjmayo/status
Fix status=0 setting being ignored
2020-08-14 20:02:01 +01:00
Chris Mayo
33a5444dea
Merge pull request #469 from cjmayo/checklink
Remove defaults from lc_cgi.checklink()
2020-08-14 19:57:03 +01:00
Chris Mayo
5aa2ddce4d
Merge pull request #461 from cjmayo/docstrings
Fix formatting and typos in docstrings
2020-08-14 19:45:41 +01:00
Chris Mayo
7ee151ebbf Don't translate "Retry-After" server header field
It is defined in RFC 7231.
2020-08-14 19:29:19 +01:00
Chris Mayo
ad71cb4e43 Fix CssSyntaxCheck list index out of range
Errors do not report the column.
2020-08-14 19:25:21 +01:00
Chris Mayo
94dbac1e5e Fix CssSyntaxCheck warning message, CSS not HTML 2020-08-14 19:25:21 +01:00
Chris Mayo
e053b3bc5f HtmlSyntaxCheck disabled because it is broken 2020-08-14 19:25:21 +01:00
Chris Mayo
068a60ee39 SyntaxCheck plugins only work with http
They use a Requests session from url_data.
2020-08-14 19:25:21 +01:00
Chris Mayo
7d950cf848 Fix blacklist updating
A second run creates an additional entry in blacklist rather than
upating the original:
1 '"(\'http://localhost/broken.html\', \'http://localhost/nosuchlink.html\')"'
1 "('http://localhost/broken.html', 'http://localhost/nosuchlink.html')"

Broken since at least 9.3:
1 "(u'http://localhost/broken.html', u'http://localhost/nosuchlink.html')"
1 u'"(u\'http://localhost/broken.html\', u\'http://localhost/nosuchlink.html\')"'

If such an entry is found LinkChecker will now halt. Either remove
the entry or the whole file.
2020-08-13 19:32:21 +01:00
Chris Mayo
682bdbeab4 Add missing essential modules to internal error message 2020-08-12 19:38:40 +01:00
Chris Mayo
8c804c35a5 Detect sitemaps that do not start with an XML declaration 2020-08-11 19:35:56 +01:00
Chris Mayo
658c8051f0 Fix deprecation warning for use of the imp module 2020-08-10 19:32:04 +01:00
Chris Mayo
80763ed1ea Add slack to the list of ignored schemes
slack:// is a way to interact with a local Slack client [1], and is not
something that LinkChecker can check.

[1] https://api.slack.com/reference/deep-linking#client
2020-08-09 17:10:26 +01:00
Chris Mayo
f19fd4f5bc Update IANA scripts and ignored schemes (2020-07-28) 2020-08-09 17:10:26 +01:00
Chris Mayo
d5690203fc Fix critical exception when parsing a URL with a ]
e.g.:
<a href="http://localhost]">square</a>

Causes urllib to raise a ValueError:
  File "/usr/lib/python3.8/site-packages/linkcheck/url.py", line 315, in url_norm
    line: urlparts = list(urllib.parse.urlsplit(url))
    locals:
      urlparts = <not found>
      list = <builtin> <class 'list'>
      urllib = <global> <module 'urllib' from '/usr/lib/python3.8/urllib/__init__.py'>
      urllib.parse = <global> <module 'urllib.parse' from '/usr/lib/python3.8/urllib/parse.py'>
      urllib.parse.urlsplit = <global> <function urlsplit at 0x7f950e699e50>
      url = <local> 'http://localhost]', len = 17
  File "/usr/lib/python3.8/urllib/parse.py", line 440, in urlsplit
    line: raise ValueError("Invalid IPv6 URL")
    locals:
      ValueError = <builtin> <class 'ValueError'>
2020-08-08 16:47:31 +01:00
Chris Mayo
27f22ae17a Fix treating data: URIs in srcset values as links 2020-08-07 20:04:23 +01:00
Chris Mayo
7ba4053710 Fix critical exception if srcset value ends with a comma
Log a debug message as this is a minor syntax problem, won't stop
LinkChecker parsing strings up to the comma.
2020-08-07 20:04:23 +01:00