Commit graph

3301 commits

Author SHA1 Message Date
Chris Mayo
d6936ceb91 Add warning url-content-type-unparseable 2022-09-02 19:29:11 +01:00
Kian-Meng Ang
a70ea9ea14 Fix typos
Found via `codespell ./linkcheck/ ./tests ./doc/man/en -L bu,noone,fo,pres,shttp`
2022-09-02 17:20:02 +08:00
Chris Mayo
b35036af2b
Merge pull request #634 from cjmayo/pyxdg
Remove dependency on pyxdg
2022-08-30 19:28:03 +01:00
Chris Mayo
d72649453c
Merge pull request #632 from cjmayo/docs
Assorted documentation updates
2022-08-30 19:27:10 +01:00
Felix Yan
7db1a867ab
Correct a typo in i18n.py 2022-08-24 19:10:41 +03:00
Chris Mayo
fbceca5dc9 Remove dependency on pyxdg
Read the environment variables and implement the same fallbacks.
Saves a hardly used dependency and is more explicit.
2022-08-23 19:26:15 +01:00
Chris Mayo
10f3d33041 Finish documenting the use of XDG_CONFIG_HOME and XDG_DATA_HOME
Introduced by:
a03e2e4a ("use xdg dirs for config & data", 2017-10-17)
2022-08-23 19:21:53 +01:00
Chris Mayo
94781120ac Correct mention of pdfminer in WordParser comment 2022-05-18 19:29:54 +01:00
Malte Gerth
cc48a09308 Add Telegram and WhatsApp link schemes 2022-02-06 23:41:33 +01:00
Malte Gerth
067dd8edbb Update IANA schemes 2022-02-06 23:40:36 +01:00
Chris Mayo
141a811ba6 Enable creating a binary with PyOxidizer
With PyOxidizer 0.18.0 AppName in setup.py has to be changed to the
all lower case "linkchecker".

Application translations do not work.

better_exchook2.fallback_findfile() may still need converting, first
needs a test.
2021-12-30 19:27:04 +00:00
Chris Mayo
5768b76f6c Use pkgutil to simplify loader.get_package_modules()
Replaces use of __file__.
2021-12-30 19:27:04 +00:00
Chris Mayo
a55bbc5237 Write RELEASE_DATE to egg-info 2021-12-30 19:27:04 +00:00
Chris Mayo
50b2063a4b Install translation catalogs in the package data
Custom clean command no longer needed because share directory is not
created in build.
2021-12-30 19:27:04 +00:00
Chris Mayo
1d10fffde4 Use package metadata 2021-12-30 19:27:04 +00:00
Chris Mayo
819dacb9bb Install linkcheckerrc in the package data
data/__init__.py needed for Python < 3.10
(namespace packages supported from importlib_resources v3.2)
2021-12-30 19:27:04 +00:00
Chris Mayo
5c0d66dd74 Raise minimum Python requirement to 3.7 2021-12-30 19:27:04 +00:00
Chris Mayo
a9ab4d847b Remove get_share_file()
cacert.pem not used since:
e3ab9024 ("Remove platform-specific installer stuff and ensure a build .whl wheel file can be built.", 2016-01-17)
2021-12-30 19:27:04 +00:00
Chris Mayo
2fa0016ae9 Remove Portable
Building portable removed in:
e3ab9024 ("Remove platform-specific installer stuff and ensure a build .whl wheel file can be built.", 2016-01-17)
2021-12-30 19:27:04 +00:00
Chris Mayo
3359c7364f Remove is_frozen()
Not used since:
e3ab9024 ("Remove platform-specific installer stuff and ensure a build .whl wheel file can be built.", 2016-01-17)
2021-12-30 19:27:04 +00:00
Chris Mayo
271cb59e62 Remove unused code from i18n 2021-12-30 19:27:04 +00:00
Chris Mayo
158c401dae Update copyright to 2022 2021-12-30 19:27:04 +00:00
Chris Mayo
8bc3b39b41 One more proxy documentation update
a2e379a5 ("Remove built-in GNOME and KDE proxy support", 2021-12-13)
2021-12-21 19:23:00 +00:00
Chris Mayo
5fef9a3b60 Generate linkchecker command using an entry point
drop_privileges() is only used by the linkchecker command.
Move installing SIGUSR1 handler to the linkchecker command only - fixes
intermittent test failures.
2021-12-20 19:34:58 +00:00
Chris Mayo
efb92fbee8 Create setup_config from linkchecker 2021-12-20 19:34:58 +00:00
Chris Mayo
e501c4ffac Create ArgParser from linkchecker 2021-12-20 19:34:58 +00:00
Chris Mayo
9bc1f4d04e Use relative import for configuration in failures.py 2021-12-20 19:34:58 +00:00
Chris Mayo
4444a87eb9 Update Requests bug link 2021-12-15 19:34:24 +00:00
Chris Mayo
5f3b007934
Merge pull request #591 from cjmayo/robot
Assume robots.txt is UTF-8
2021-12-15 19:31:00 +00:00
Chris Mayo
d70ec6f75b Assume robots.txt is UTF-8
Match the Python standard library and Google's interpretation:
https://developers.google.com/search/docs/advanced/robots/robots_txt#file-format

Avoid Unhandled LookupError.
2021-12-13 19:31:55 +00:00
Chris Mayo
76815bcf47 Don't guess the URL for files that end in .html
Fixes:
linkchecker ftp.html
failing looking for ftp://ftp.html
2021-12-13 19:31:13 +00:00
Chris Mayo
9504a6dddf Document the curl_ca_bundle environment variable 2021-12-13 19:25:23 +00:00
Chris Mayo
a2e379a595 Remove built-in GNOME and KDE proxy support
Only http_proxy was ever supported.

Requests uses urllib.request.getproxies().

Fedora 35 and Ubuntu 20.04 do set proxy environment variables when
settings are added through the GUI.

GNOME location of proxy settings is subject to change:
https://wiki.gnome.org/Projects/NetworkManager/Proxies
https://gitlab.gnome.org/GNOME/gsettings-desktop-schemas/-/issues/27
2021-12-13 19:25:23 +00:00
Chris Mayo
fe5a34c68f Remove linkcheck.checker.proxysupport
Set up the requests.Session() with the complete proxy configuration
to fix a problem with using an HTTP server as an HTTPS proxy and
potential redirection issues.

Requests handles no_proxy.
2021-12-13 19:25:23 +00:00
Chris Mayo
35ecb7e639 Add https_proxy to internal error message 2021-12-13 19:25:23 +00:00
Chris Mayo
a60648e348 Remove support for ftp_proxy
Was limited to HTTP proxy servers and prevents simplifying and fixing
HTTP proxy support.
2021-12-13 19:25:23 +00:00
Chris Mayo
f2e5a435e3 Remove unused ProxySupport.proxyauth
Not used since:
7b34be590 ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)
2021-12-13 19:25:23 +00:00
Chris Mayo
0b3bdedd6d
Merge pull request #583 from cjmayo/newest
Replace "Get the newest version at"
2021-12-13 19:21:32 +00:00
Chris Mayo
945ad903a3
Merge pull request #579 from cjmayo/redirect
Update HttpUrl.encoding after following redirects
2021-12-13 19:20:28 +00:00
Koen Van den Wijngaert
900586dc01
Better handling for link rel dns-prefetch and add preconnect support (#536)
preconnect is only DNS checked.

This is allowed even in the Resource Hints Editor's Draft
https://w3c.github.io/resource-hints/#preconnect
2021-12-09 19:38:30 +00:00
Chris Mayo
d08f6a0730 Replace "Get the newest version at" 2021-12-06 19:36:22 +00:00
Chris Mayo
a04214465a Update HttpUrl.encoding after following redirects 2021-12-06 19:34:31 +00:00
Chris Mayo
0325ecd73f Remove httpurl.HEADER_ENCODING
Unused since:
d91a32822 ("Remove strformat.unicode_safe() and strformat.url_unicode_split()", 2020-07-07)
2021-12-06 19:34:31 +00:00
Chris Mayo
c89c617a58 Ignore an encoding of ISO-8859-1 returned by Requests
ISO-8859-1 is a fallback for Requests and causes us to mangle UTF-8
content.

Requests' utils.py:

def get_encoding_from_headers(headers):
    """Returns encodings from given HTTP Header Dict.

    :param headers: dictionary to extract encoding from.
    :rtype: str
    """

    content_type = headers.get('content-type')

    if not content_type:
        return None

    content_type, params = _parse_content_type_header(content_type)

    if 'charset' in params:
        return params['charset'].strip("'\"")

    if 'text' in content_type:
        return 'ISO-8859-1'

    if 'application/json' in content_type:
        # Assume UTF-8 based on RFC 4627: https://www.ietf.org/rfc/rfc4627.txt since the charset was unset
        return 'utf-8'
2021-11-29 19:52:37 +00:00
Chris Mayo
a4b14047d6 Make quiet/-q set application logging to warning 2021-11-29 19:48:50 +00:00
Chris Mayo
0356524369 Disable AnchorCheck plugin
Can't be relied on. Multiple reports of expected results not returned.

https://github.com/linkchecker/linkchecker/issues/542
https://github.com/linkchecker/linkchecker/issues/555
https://github.com/linkchecker/linkchecker/issues/568

Previously a fix was needed just to get the tests working:
0912e8a2c ("Don't strip the URL fragment from cache key if using AnchorCheck", 2020-07-27)

After:
eaa538c81 ("don't check one url multiple times", 2016-11-09)
2021-11-29 19:35:34 +00:00
Chris Mayo
2a77e12618 Replace deprecated Thread.getName() and Condition.notifyAll() 2021-11-16 19:45:38 +00:00
Chris Mayo
43507cf80a Make partial and example URLs in docstrings italic
Prevent Sphinx from turning them into broken links.
2021-08-12 19:28:50 +01:00
Chris Mayo
5de3920f6c Fix broken external links in documentation 2021-08-12 19:28:50 +01:00
Paul Haerle
f395c74aac
Make ResultCache max_size configurable (#544)
* Make ResultCache max_size configurable

fixes #463

* Add tests and docs.

* fix documentation...

...adapt the source, not the auto-generated man pages themselves as
requested in #544.

* fix typo.
2021-06-21 19:45:19 +01:00
Chris Mayo
c31d233f06 Disable status logging in WSGI application
Not a problem earlier because the default for the CLI is to record
status, but this was not fully implemented until:
4f3f1ac0 ("Fix status=0 setting being ignored", 2020-08-06)
2021-01-28 19:20:24 +00:00
Chris Mayo
09b4da393e Initialise Configuration.status_logger
Fixes failure of the LinkChecker WSGI application which does
not call Configuration.set_status_logger().
2021-01-28 19:20:24 +00:00
Chris Mayo
136e8a3625 Update to version 10.0.1.dev0 2021-01-28 19:20:24 +00:00
Chris Mayo
a3e9c31560 Remove execute bits from parsepdf.py and parseword.py 2021-01-14 19:48:22 +00:00
Chris Mayo
e922dd0224 Stop using biplist
plistlib has supported binary files since Python 3.4.
2020-10-12 19:55:46 +01:00
Chris Mayo
0920508413
Merge pull request #498 from cjmayo/linkchecker
Tidy linkchecker
2020-09-24 19:31:07 +01:00
Chris Mayo
ca59966cf0 Add a note linking to biplist Python 3.9 compatibility bug 2020-09-23 19:38:17 +01:00
Chris Mayo
26c15c5e67 Fix deprecation warning for resolver.query()
/home/travis/build/linkchecker/linkchecker/linkcheck/checker/mailtourl.py:321: DeprecationWarning: please use dns.resolver.resolve() instead
    answers = resolver.query(domain, 'MX')
2020-09-14 19:55:05 +01:00
Chris Mayo
70d749a967 Drop Python 3.5, add 3.9 2020-09-14 19:55:05 +01:00
Chris Mayo
f268b95cf8 biplist is not compatible with Python 3.9
File ".tox/py39/lib/python3.9/site-packages/biplist/__init__.py", line 143, in readPlist
    line: raise InvalidPlistException(e)
    locals:
      InvalidPlistException = <global> <class 'biplist.InvalidPlistException'>
      e = <not found>

InvalidPlistException: module 'plistlib' has no attribute 'Data'
2020-09-14 19:55:05 +01:00
Chris Mayo
b1faef93c3
Merge pull request #495 from cjmayo/mswindows
MS Windows Python 3.7 and MS Store compatibility
2020-09-01 19:46:44 +01:00
Chris Mayo
314ec085a3
Merge pull request #462 from cjmayo/anchor
Fix anchor checking
2020-09-01 19:39:29 +01:00
Chris Mayo
a6d6fa0cd4 Tidy linkchecker intro 2020-08-30 18:40:39 +01:00
Chris Mayo
2fbd49dd0b Replace os.path.splitunc() with os.path.splitdrive()
os.path.splitunc() removed in Python 3.7.

https://docs.python.org/3/whatsnew/3.7.html#api-and-feature-removals
2020-08-29 16:57:57 +01:00
Chris Mayo
37e4981089
Merge pull request #492 from cjmayo/pass
Assorted tidying included unneeded pass statements
2020-08-29 16:55:39 +01:00
Chris Mayo
7ef599fc20
Merge pull request #491 from cjmayo/sphinx2
Documentation Updates
2020-08-29 16:50:27 +01:00
Chris Mayo
1390c9cd7e
Merge pull request #489 from cjmayo/urlsplit
Replace deprecated urllib.parse.split functions
2020-08-29 16:44:56 +01:00
Chris Mayo
47604e7d34
Merge pull request #481 from cjmayo/failures
Rename blacklist to failures
2020-08-29 16:39:24 +01:00
Chris Mayo
7dfba766a9
Merge pull request #486 from cjmayo/url
Remove unused code from url.py
2020-08-26 19:28:50 +01:00
Chris Mayo
b1d19e5eab Update copyright and version 2020-08-23 17:24:09 +01:00
Chris Mayo
2de25d54fd Rename blacklist to failures
Continue to support blacklist for the time being, with deprecation
warnings.
2020-08-23 17:19:26 +01:00
Chris Mayo
dfa1ff05dc Backport tabs to spaces from better_exchook.py 2020-08-22 17:17:02 +01:00
Chris Mayo
2864962c13 Backport bare except changes from better_exchook.py 2020-08-22 17:17:02 +01:00
Chris Mayo
1f58419322 Remove unneeded pass statements 2020-08-22 17:17:02 +01:00
Chris Mayo
8779c39735 Replace deprecated urllib.parse.split functions 2020-08-22 16:28:53 +01:00
Chris Mayo
5a2eda9058
Merge pull request #488 from cjmayo/gschema
Avoid dependency on gsettings-desktop-schemas
2020-08-21 16:56:25 +01:00
Chris Mayo
1b497389b5
Merge pull request #483 from cjmayo/retryafter
Don't translate "Retry-After" server header field
2020-08-21 16:51:17 +01:00
Chris Mayo
4969b6dd0a
Merge pull request #482 from cjmayo/syntaxcheck
Fix CssSyntaxCheck list index out of range
2020-08-21 16:46:37 +01:00
Chris Mayo
e9db151145
Merge pull request #480 from cjmayo/blacklist
Fix blacklist updating
2020-08-20 19:48:59 +01:00
Chris Mayo
b869b8876f Avoid dependency on gsettings-desktop-schemas
Gio.Settings.new() causes LinkChecker to exit if the GNOME proxy schema
cannot be found.
2020-08-20 19:42:44 +01:00
Chris Mayo
cfe5c89eb6
Merge pull request #479 from cjmayo/versions
Add missing essential modules to internal error message
2020-08-20 19:36:45 +01:00
Chris Mayo
d7efa20d33 Remove unused constants from url.py 2020-08-19 19:27:28 +01:00
Chris Mayo
be24836c73 Remove unused url.url_unsplit() 2020-08-18 19:57:46 +01:00
Chris Mayo
d58b3ab285 Remove unused url.url_fix_common_typos() 2020-08-18 19:57:46 +01:00
Chris Mayo
9488e1eb41 Remove unused url.is_safe_x matches 2020-08-18 19:57:46 +01:00
Chris Mayo
71ea78382b Remove unused url.safe_host_pattern() 2020-08-18 19:57:46 +01:00
Chris Mayo
794efd6d44 Remove unused url.is_duplicate_content_url() 2020-08-18 19:57:46 +01:00
Chris Mayo
e372657fb8 Remove unused url.get_content() 2020-08-18 19:57:46 +01:00
Chris Mayo
e4ba9c84ce Remove unused url.match_{host,url}()
Removes deprecation warnings for urllib.parse.split{host,type}() in
url_split()
2020-08-18 19:57:46 +01:00
Chris Mayo
b32fe6f692
Merge pull request #478 from cjmayo/imp
Fix deprecation warning for use of the imp module
2020-08-18 19:56:40 +01:00
Chris Mayo
4ad20d7f03
Merge pull request #477 from cjmayo/sitemap
Detect sitemaps that do not start with an XML declaration
2020-08-18 19:51:32 +01:00
Chris Mayo
5d83e93829
Merge pull request #475 from cjmayo/iana
Update IANA scripts and ignored schemes
2020-08-18 19:40:35 +01:00
Chris Mayo
0086c28b3a
Merge pull request #474 from cjmayo/srcset
Fix problems with trailing commas and data: URIs in srcset values
2020-08-15 16:58:38 +01:00
Chris Mayo
0269fd88b0 Merge pull request #473 from cjmayo/valueerror
Fix critical exception when parsing a URL with a ]
2020-08-15 16:51:17 +01:00
Chris Mayo
88566ad20a
Merge pull request #472 from cjmayo/baseref
Fix CSV logger not recognising base part setting
2020-08-15 16:41:57 +01:00
Chris Mayo
525b6751a9 Merge pull request #468 from cjmayo/interrupter
Rename director/interrupt.py to director/interrupter.py
2020-08-15 16:31:33 +01:00
Chris Mayo
ccaa882d50
Merge pull request #471 from cjmayo/status
Fix status=0 setting being ignored
2020-08-14 20:02:01 +01:00
Chris Mayo
33a5444dea
Merge pull request #469 from cjmayo/checklink
Remove defaults from lc_cgi.checklink()
2020-08-14 19:57:03 +01:00
Chris Mayo
5aa2ddce4d
Merge pull request #461 from cjmayo/docstrings
Fix formatting and typos in docstrings
2020-08-14 19:45:41 +01:00
Chris Mayo
7ee151ebbf Don't translate "Retry-After" server header field
It is defined in RFC 7231.
2020-08-14 19:29:19 +01:00