Commit graph

970 commits

Author SHA1 Message Date
Chris Mayo
6970a6c0a6 Add litecoin schemes
https://litecoin.info/docs/key-concepts/uri-scheme
2025-07-21 19:21:33 +01:00
Chris Mayo
45ee206f0c Update IANA schemes 2025-07-21 19:21:33 +01:00
Chris Mayo
7aa97cc632 Update IANA schemes 2024-09-03 19:33:11 +01:00
nodet
28f6743778
Add ignorewarningsforurls to ignore specific warnings (#794)
We want to allow specifying a warning to ignore for
each URL. If no regex is specified for the warning to ignore,
we'll ignore all warnings.

The tests still pass as they are, which means that unknown
values in the configuration file are simply ignored.

* [#782] Add values to configuration file

* [#782] Parse new configuration values

* [#782] Actually ignore a warning

* [#782] Confirm side cases work as expected

* [#782] Add logging when deciding to ignore warnings

* [#782] Documentation for ignorewarningsforurls

* [#782] Update (generated) man pages

* [#782] These tests pass without network, actually

* [#782] Fix copy/paste error in symbol naming

* [#782] The regex matches the name of the warning, not the message

* [#782] Better wording

* [#782] Update (generated) man pages

* [#782] We match the type, not the message
2024-02-13 19:43:29 +00:00
Chris Mayo
042aa2c915 Update IANA schemes 2023-12-04 19:25:01 +00:00
Chris Mayo
187ded1d9b Add ms-windows-store scheme
https://learn.microsoft.com/en-us/windows/uwp/launch-resume/launch-store-app
2023-10-30 19:23:46 +00:00
Chris Mayo
630de40660
Merge pull request #753 from cjmayo/deprecated
Minor deprecation fixes
2023-09-04 19:23:21 +01:00
Chris Mayo
0faccf2ab3
Merge pull request #752 from cjmayo/deprecated_modules
Remove support for nntp and telnet
2023-09-04 19:22:38 +01:00
Chris Mayo
7763704067 Replace deprecated datetime.utcfromtimestamp() 2023-08-28 19:26:25 +01:00
Chris Mayo
ce4bb7557b Update IANA schemes
telnet was included in:
ccd0d4ead ("Updated the list of unknown or ignored URI schemes.", 2014-03-12)
2023-08-28 19:24:57 +01:00
Chris Mayo
b3429c4759 Remove support for nntp and telnet
Python is dropping nntplib and telnetlib.
2023-08-28 19:24:57 +01:00
Chris Mayo
4d9749c5ba Log ignored warning messages as info 2023-08-28 19:22:24 +01:00
Chris Mayo
beaf9399f8 Elevate redirection to a warning tagged http-redirected
Include the HTTP status code and reason in the message.
2023-08-28 19:22:24 +01:00
Chris Mayo
e6da68b7f6 Add linting with Pylint to build workflow 2023-05-03 19:24:53 +01:00
Chris Mayo
4b06485a05 Fix FTP checker
In Python 2 StringIO could accept either Unicode or 8-bit strings.
Similar change made for HttpUrl:
06fdd78f9 ("Python3: fix TypeError in HttpUrl.read_content()",
2019-09-15)

Non-existent FtpUrl.max_size introduced in:
7b34be590 ("Introduce check plugins, use Python requests for http/s
connections, and some code cleanups and improvements.", 2014-03-01)

Additional self.direct() not added in:
f107092a8 ("Fix handling of user/password info in URLs.", 2012-06-10)
2023-04-17 19:24:22 +01:00
Chris Mayo
4433556915 Make checker.get_index_html() return bytes
Shared with FtpUrl.read_content().
2023-04-17 19:24:22 +01:00
Chris Mayo
b87d26f992 Fix translatability of AnchorCheck directory warning 2022-11-08 19:21:29 +00:00
Chris Mayo
8065c75c4e Convert some printf-style strings 2022-11-08 19:21:29 +00:00
Chris Mayo
b6bc366af0 Run pyupgrade --py37-plus x 2 2022-11-08 19:21:29 +00:00
Chris Mayo
55c13f0834 Remove deprecated aliases for OSError 2022-11-08 19:21:29 +00:00
Chris Mayo
0bb1576887 Run pyupgrade --py37-plus --keep-percent-format 2022-11-08 19:21:29 +00:00
Chris Mayo
0a8c29ffcc Add docstring for AnchorCheckFileUrl 2022-11-02 19:24:35 +00:00
Chris Mayo
16bee50068 Move AnchorCheck local file handling into a new class
When checking local files with AnchorCheck, anchors in URLs
like "example/#anchor" are not supported.

Without AnchorCheck enabled, the Real URL reported for such URLs
was changed to include the anchor when local file checking was added to
AnchorCheck, but it is the directory that is checked.
The same URL was also then used as the Parent URL for the check of each
of the contents of that directory.

For FileUrl this is a revert of:
c221afda ("Enable AnchorCheck to be used with local files", 2022-10-03)
2022-10-24 19:30:56 +01:00
Chris Mayo
b6eea83f63
Merge pull request #676 from cjmayo/robotmap
Document sitemaps in linkchecker(1)
2022-10-17 19:25:57 +01:00
Chris Mayo
96c3336013
Merge pull request #677 from cjmayo/maxrate
Enable average HTTP request rate to be above 4 per second
2022-10-17 19:24:49 +01:00
Chris Mayo
689557d9af Add logging of MIME types and improve docstrings 2022-10-17 19:21:03 +01:00
Chris Mayo
eab2fa410e Log robots.txt as the sitemap parent URL
This is the location the sitemap URL was found in. The line being
reported is the line in robots.txt.
2022-10-17 19:21:03 +01:00
Chris Mayo
e88cf49c8f Enable average HTTP request rate to be above 4 per second 2022-10-05 19:28:01 +01:00
Chris Mayo
f2be98b8ad Replace deprecated dns.resolver.query()
Missed in:
26c15c5e ("Fix deprecation warning for resolver.query()", 2020-09-14)
2022-10-05 19:27:13 +01:00
Nathan Arthur
33036803b0 Fix a difference in anchor quoting between http and file
"I added a test for file:// processing, and it was showing different
results for when the URL anchor was and wasn't quoted. I tracked it down
to code in fileurl.py that was calling url_norm, and I'm pretty sure the
code is unnecessary at this point. But I made a minimally-invasive
change, to be as safe as possible."

UrlBase.build_url() in line 174 also calls url_norm()
2022-10-03 19:33:05 +01:00
Nathan Arthur
c221afdab5 Enable AnchorCheck to be used with local files
[I] discovered that fileurl.py was stripping the anchors from url_data,
which breaks AnchorCheck. So I stopped it from doing that, and
tried to fix up all the places that were assuming the url would map to a
filesystem file. The tests all pass, but I'm not 100% sure I caught all
the cases, or fixed them correctly.
2022-10-03 19:33:05 +01:00
Nathan Arthur
a29750c57f Fix anchor comments in UrlBase
Parent url query not stripped since:
4a0c63aa ("Fix joining of URLs when parent URL has CGI parameter.", 2011-02-08)
2022-10-03 19:33:05 +01:00
Chris Mayo
52b9881820 Separate URL encoding and content encoding
Ensure users of url_data.encoding are using the URL encoding.

Combined since:
5fc01455 ("Decode content when retrieved, use bs4 to detect encoding if non-Unicode", 2019-09-30)
2022-09-29 19:21:11 +01:00
Chris Mayo
61071fc5dc
Merge pull request #668 from cjmayo/defaults
Clarify default values in initial linkcheckerrc and elsewhere
2022-09-28 19:36:44 +01:00
Lukas Pirl
8c959589c3
add option to ignore specific errors for specific URLs 2022-09-25 22:52:04 +02:00
Chris Mayo
130347f223 Remove unused WARN_IGNORE_URL
URL ignored was changed to an info message in:
7b34be59 ("Introduce check plugins, use Python requests for http/s
connections, and some code cleanups and improvements.", 2014-03-01)
2022-09-22 19:24:55 +01:00
Chris Mayo
ed8e17137c Add gemini scheme 2022-09-16 19:21:32 +01:00
Chris Mayo
25ce4b854c Update IANA schemes 2022-09-16 19:21:32 +01:00
Chris Mayo
a0b28cc0ff Rename url-rate-limited to http-rate-limited
Make consistent with the other warnings:

- The first part of the name represents the checker class in which the
  warning is raised

- Update initial comment
2022-09-06 19:32:24 +01:00
Chris Mayo
3c7fb5b571 Fix checking directory containing Unicode filenames
Non-Unicode filenames are not supported.

sys.platform has not returned "linux2" since Python 3.3.
2022-09-05 19:28:40 +01:00
Chris Mayo
c79bc07cee Add MIME type application/vnd.adobe.flash.movie 2022-09-02 19:29:11 +01:00
Chris Mayo
d6936ceb91 Add warning url-content-type-unparseable 2022-09-02 19:29:11 +01:00
Kian-Meng Ang
a70ea9ea14 Fix typos
Found via `codespell ./linkcheck/ ./tests ./doc/man/en -L bu,noone,fo,pres,shttp`
2022-09-02 17:20:02 +08:00
Malte Gerth
cc48a09308 Add Telegram and WhatsApp link schemes 2022-02-06 23:41:33 +01:00
Malte Gerth
067dd8edbb Update IANA schemes 2022-02-06 23:40:36 +01:00
Chris Mayo
4444a87eb9 Update Requests bug link 2021-12-15 19:34:24 +00:00
Chris Mayo
76815bcf47 Don't guess the URL for files that end in .html
Fixes:
linkchecker ftp.html
failing looking for ftp://ftp.html
2021-12-13 19:31:13 +00:00
Chris Mayo
fe5a34c68f Remove linkcheck.checker.proxysupport
Set up the requests.Session() with the complete proxy configuration
to fix a problem with using an HTTP server as an HTTPS proxy and
potential redirection issues.

Requests handles no_proxy.
2021-12-13 19:25:23 +00:00
Chris Mayo
a60648e348 Remove support for ftp_proxy
Was limited to HTTP proxy servers and prevents simplifying and fixing
HTTP proxy support.
2021-12-13 19:25:23 +00:00
Chris Mayo
f2e5a435e3 Remove unused ProxySupport.proxyauth
Not used since:
7b34be590 ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)
2021-12-13 19:25:23 +00:00