Commit graph

3160 commits

Author SHA1 Message Date
Chris Mayo
d58b3ab285 Remove unused url.url_fix_common_typos() 2020-08-18 19:57:46 +01:00
Chris Mayo
9488e1eb41 Remove unused url.is_safe_x matches 2020-08-18 19:57:46 +01:00
Chris Mayo
71ea78382b Remove unused url.safe_host_pattern() 2020-08-18 19:57:46 +01:00
Chris Mayo
794efd6d44 Remove unused url.is_duplicate_content_url() 2020-08-18 19:57:46 +01:00
Chris Mayo
e372657fb8 Remove unused url.get_content() 2020-08-18 19:57:46 +01:00
Chris Mayo
e4ba9c84ce Remove unused url.match_{host,url}()
Removes deprecation warnings for urllib.parse.split{host,type}() in
url_split()
2020-08-18 19:57:46 +01:00
Chris Mayo
b32fe6f692
Merge pull request #478 from cjmayo/imp
Fix deprecation warning for use of the imp module
2020-08-18 19:56:40 +01:00
Chris Mayo
4ad20d7f03
Merge pull request #477 from cjmayo/sitemap
Detect sitemaps that do not start with an XML declaration
2020-08-18 19:51:32 +01:00
Chris Mayo
5d83e93829
Merge pull request #475 from cjmayo/iana
Update IANA scripts and ignored schemes
2020-08-18 19:40:35 +01:00
Chris Mayo
0086c28b3a
Merge pull request #474 from cjmayo/srcset
Fix problems with trailing commas and data: URIs in srcset values
2020-08-15 16:58:38 +01:00
Chris Mayo
0269fd88b0 Merge pull request #473 from cjmayo/valueerror
Fix critical exception when parsing a URL with a ]
2020-08-15 16:51:17 +01:00
Chris Mayo
88566ad20a
Merge pull request #472 from cjmayo/baseref
Fix CSV logger not recognising base part setting
2020-08-15 16:41:57 +01:00
Chris Mayo
525b6751a9 Merge pull request #468 from cjmayo/interrupter
Rename director/interrupt.py to director/interrupter.py
2020-08-15 16:31:33 +01:00
Chris Mayo
ccaa882d50
Merge pull request #471 from cjmayo/status
Fix status=0 setting being ignored
2020-08-14 20:02:01 +01:00
Chris Mayo
33a5444dea
Merge pull request #469 from cjmayo/checklink
Remove defaults from lc_cgi.checklink()
2020-08-14 19:57:03 +01:00
Chris Mayo
5aa2ddce4d
Merge pull request #461 from cjmayo/docstrings
Fix formatting and typos in docstrings
2020-08-14 19:45:41 +01:00
Chris Mayo
8c804c35a5 Detect sitemaps that do not start with an XML declaration 2020-08-11 19:35:56 +01:00
Chris Mayo
658c8051f0 Fix deprecation warning for use of the imp module 2020-08-10 19:32:04 +01:00
Chris Mayo
80763ed1ea Add slack to the list of ignored schemes
slack:// is a way to interact with a local Slack client [1], and is not
something that LinkChecker can check.

[1] https://api.slack.com/reference/deep-linking#client
2020-08-09 17:10:26 +01:00
Chris Mayo
f19fd4f5bc Update IANA scripts and ignored schemes (2020-07-28) 2020-08-09 17:10:26 +01:00
Chris Mayo
d5690203fc Fix critical exception when parsing a URL with a ]
e.g.:
<a href="http://localhost]">square</a>

Causes urllib to raise a ValueError:
  File "/usr/lib/python3.8/site-packages/linkcheck/url.py", line 315, in url_norm
    line: urlparts = list(urllib.parse.urlsplit(url))
    locals:
      urlparts = <not found>
      list = <builtin> <class 'list'>
      urllib = <global> <module 'urllib' from '/usr/lib/python3.8/urllib/__init__.py'>
      urllib.parse = <global> <module 'urllib.parse' from '/usr/lib/python3.8/urllib/parse.py'>
      urllib.parse.urlsplit = <global> <function urlsplit at 0x7f950e699e50>
      url = <local> 'http://localhost]', len = 17
  File "/usr/lib/python3.8/urllib/parse.py", line 440, in urlsplit
    line: raise ValueError("Invalid IPv6 URL")
    locals:
      ValueError = <builtin> <class 'ValueError'>
2020-08-08 16:47:31 +01:00
Chris Mayo
27f22ae17a Fix treating data: URIs in srcset values as links 2020-08-07 20:04:23 +01:00
Chris Mayo
7ba4053710 Fix critical exception if srcset value ends with a comma
Log a debug message as this is a minor syntax problem, won't stop
LinkChecker parsing strings up to the comma.
2020-08-07 20:04:23 +01:00
Chris Mayo
f3a823fb5b Fix CSV logger not recognising base part setting 2020-08-07 19:45:24 +01:00
Chris Mayo
4f3f1ac0d4 Fix status=0 setting being ignored
- Set the correct default for the setting in configuration.Configuration
- Detect when the argument is not passed by setting the default to None
  (store_false sets the default to True)
2020-08-06 19:32:33 +01:00
Chris Mayo
40b2ebff8f Remove defaults from lc_cgi.checklink()
Only called from application() with arguments. Causes local environment
to be embedded in documentation when using Sphinx autodoc.
2020-08-05 19:54:56 +01:00
Chris Mayo
46b9e6b169 Rename director/interrupt.py to director/interrupter.py
Avoid a clash with director.interrupt() when automatically documenting.
2020-08-03 19:48:07 +01:00
Chris Mayo
dee21ee9a0 Fix formatting and typos in docstrings 2020-07-25 16:35:48 +01:00
Chris Mayo
500c13e2cb Log a debug message when a cached URL is skipped
Skipping introduced in:
eaa538c8 ("don't check one url multiple times", 2016-11-09)
2020-07-21 19:54:18 +01:00
Chris Mayo
a977e4d712
Merge pull request #444 from cjmayo/isinstance
Remove or replace uses of isinstance()
2020-07-08 19:55:29 +01:00
Chris Mayo
7a0644a234 No need to process an empty string in str_format.ascii_safe() 2020-07-08 19:47:59 +01:00
Chris Mayo
b328520f08 Convert UrlBase syntax Exception to a string
Causes an exception when logging.
2020-07-07 17:25:28 +01:00
Chris Mayo
53bd5c4d21 Remove HttpUrl.getheader() 2020-07-07 17:25:28 +01:00
Chris Mayo
1018b8332b Convert PDF URL to a string 2020-07-07 17:25:28 +01:00
Chris Mayo
3fcee872b6 urlparts need to support assignment 2020-07-07 17:25:28 +01:00
Chris Mayo
d91a328224 Remove strformat.unicode_safe() and strformat.url_unicode_split()
All strings support Unicode in Python 3.
2020-07-07 17:25:28 +01:00
Chris Mayo
4cb5b6f2fa
Merge pull request #443 from cjmayo/kde5
Replace KDE 3 proxy support with KDE 5 support
2020-07-07 17:12:53 +01:00
Chris Mayo
18f20d592f Check for KDE 5 proxy first and then KDE 4
Don't look for kde4-config in case a KDE 5 user still has it installed.
2020-07-07 17:06:25 +01:00
Chris Mayo
bd55c2ef8f Compare KDE proxy ReversedException integer value to zero 2020-07-07 17:06:25 +01:00
Chris Mayo
da22d4886b
Merge pull request #441 from cjmayo/authentication
Improve documentation of authentication
2020-06-23 17:35:19 +01:00
Chris Mayo
085ae188f7 Remove checks for empty loginpasswordfield and loginuserfield
These have default values and cannot be reset.
2020-06-23 17:28:31 +01:00
Chris Mayo
1ec3848720 Log problem with login form without exception 2020-06-23 17:28:31 +01:00
Chris Mayo
2f51a9dca0 Improve documentation of authentication 2020-06-23 17:28:31 +01:00
Chris Mayo
d66e64460c Remove unused code from strformat.py 2020-06-18 19:31:00 +01:00
Chris Mayo
1f77506c9f Remove isinstance() in url.url_fix_mailto_urlsplit()
urls are strings.
2020-06-18 19:27:06 +01:00
Chris Mayo
8f9f687ed8 Remove isinstance() from fileutil.path_safe()
paths are derived from urls which are strings.
2020-06-18 19:27:06 +01:00
Chris Mayo
f86e506de4 Remove isinstance() from FileUrl.read_content()
get_index_html() returns a string.
2020-06-18 19:27:06 +01:00
Chris Mayo
3231730366 Remove isinstance() from robotparser2.py
Originally for encoding Python 2 Unicode strings [1]. Will not be used
in Python 3 because the variables are strings, if they were bytes
exceptions would be raised.

[1] c97f68f7 ("accept unicode in robots.txt can_fetch", 2004-11-09)
2020-06-18 19:27:06 +01:00
Chris Mayo
9c9a3d8b14 Remove isinstance() from url.idna_encode()
Was originally used for Python 2 Unicode strings.
f4b73c6d ("Python3: fix unicode in url.py", 2018-01-05)
2020-06-18 19:27:06 +01:00
Chris Mayo
3a6540bc46 Replace isinstance() in strformat.ascii_safe() 2020-06-18 19:27:06 +01:00