Chris Mayo
bbb8096df5
Add @need_network to test_no_error() in test_ignoreerrors.py
...
Needs network access for DNS:
warning No MX mail host for example.com found.
2022-10-05 19:27:13 +01:00
Chris Mayo
354ea933ca
Merge pull request #673 from cjmayo/sitemap
...
Fix sitemap output with multiple threads
2022-10-05 19:20:40 +01:00
Chris Mayo
d9265bb71c
Merge pull request #669 from cjmayo/anchorcheck
...
Re-enable AnchorCheck plugin
2022-10-03 19:36:08 +01:00
Nathan Arthur
2d1bf6ef98
Add tests for encoded anchors for file: and http:
...
I started with a test of urlencoded anchors, assuming at the URL might
have a urlencoded anchor, but the actual anchor in the HTML would NOT be
urlencoded.
2022-10-03 19:33:05 +01:00
Nathan Arthur
33036803b0
Fix a difference in anchor quoting between http and file
...
"I added a test for file:// processing, and it was showing different
results for when the URL anchor was and wasn't quoted. I tracked it down
to code in fileurl.py that was calling url_norm, and I'm pretty sure the
code is unnecessary at this point. But I made a minimally-invasive
change, to be as safe as possible."
UrlBase.build_url() in line 174 also calls url_norm()
2022-10-03 19:33:05 +01:00
Nathan Arthur
4cdaa59fcc
Fix AnchorCheck mismatching encoded anchors
...
Problem identified by Christian Kirchhof.
2022-10-03 19:33:05 +01:00
Nathan Arthur
6499b7b233
Fix a major thread-safety bug in AnchorCheck
...
The threading issue has been there for years, but I didn't notice it
until after I thought I was done, while I was doing manual testing
(with threads re-enabled).
The problem was with storing URL-specific state (.anchors) on the
AnchorCheck object itself, because there's only one global AnchorCheck
object, so all the threads are competing to use that one simgle variable
(self.anchors).
The solution was to create a new object to hold .anchors, for each
processed URL.
2022-10-03 19:33:05 +01:00
Nathan Arthur
5398fd2406
Add an anchor test for multiple inter-connected files
2022-10-03 19:33:05 +01:00
Nathan Arthur
c221afdab5
Enable AnchorCheck to be used with local files
...
[I] discovered that fileurl.py was stripping the anchors from url_data,
which breaks AnchorCheck. So I stopped it from doing that, and
tried to fix up all the places that were assuming the url would map to a
filesystem file. The tests all pass, but I'm not 100% sure I caught all
the cases, or fixed them correctly.
2022-10-03 19:33:05 +01:00
Nathan Arthur
a29750c57f
Fix anchor comments in UrlBase
...
Parent url query not stripped since:
4a0c63aa ("Fix joining of URLs when parent URL has CGI parameter.", 2011-02-08)
2022-10-03 19:33:05 +01:00
Chris Mayo
2cbff49221
Fix http tests failing with pytest due to missing _()
...
TypeError: 'NoneType' object is not callable
Ensure LinkCheckTest.setUp() is called to initialise translations.
2022-10-03 19:33:05 +01:00
Chris Mayo
8b2fb86895
Remove AnchorCheck disabled note in linkcheckerrc(5)
...
A partial revert of:
fe6dea12 ("Update documentation for disabled plugins", 2021-11-29)
2022-10-03 19:33:05 +01:00
Chris Mayo
54bcefd7d7
Revert "Disable AnchorCheck plugin"
...
This reverts commit 0356524369 .
2022-10-03 19:33:05 +01:00
Chris Mayo
033dcf89f9
Merge pull request #671 from cjmayo/example
...
Fix formatting of ignoreerrors example in linkcheckerrc(5)
2022-10-03 19:22:36 +01:00
Chris Mayo
d6d5e918dc
Merge pull request #672 from cjmayo/encoding
...
Separate URL encoding and content encoding
2022-10-03 19:22:03 +01:00
Chris Mayo
e6763f8516
Fix sitemap output with multiple threads
...
SitemapXmlLogger assumes the first result logged is for the root of the
website being mapped. Ensure results are logged before content is
checked.
2022-09-30 19:22:17 +01:00
Chris Mayo
b3967f75c4
Correct documentation of --debug in linkchecker(1)
...
dns logger was removed in:
e1f72490 ("Move dnspython module into third_party directory.", 2011-05-24)
Threading has not been disabled with --debug since:
eaa8a963 ("Refactor logging configuration.", 2014-05-10)
2022-09-30 19:22:17 +01:00
Chris Mayo
009f22e9b6
Remove outdated comment in TestLogger
...
Configuration.init_logging() removed in:
eaa8a963 ("Refactor logging configuration.", 2014-05-10)
2022-09-30 19:22:17 +01:00
Chris Mayo
52b9881820
Separate URL encoding and content encoding
...
Ensure users of url_data.encoding are using the URL encoding.
Combined since:
5fc01455 ("Decode content when retrieved, use bs4 to detect encoding if non-Unicode", 2019-09-30)
2022-09-29 19:21:11 +01:00
Chris Mayo
61071fc5dc
Merge pull request #668 from cjmayo/defaults
...
Clarify default values in initial linkcheckerrc and elsewhere
2022-09-28 19:36:44 +01:00
Chris Mayo
001212b915
Fix formatting of ignoreerrors example in linkcheckerrc(5)
...
Introduced in:
8c959589 ("add option to ignore specific errors for specific URLs", 2022-07-21)
2022-09-28 19:23:04 +01:00
Chris Mayo
2c3aa5ebb9
Merge pull request #629 from lpirl/ignoreerrors
...
add option to ignore specific errors for specific URLs
2022-09-27 19:43:57 +01:00
Lukas Pirl
8c959589c3
add option to ignore specific errors for specific URLs
2022-09-25 22:52:04 +02:00
Chris Mayo
e5168f44ea
Clarify defaults and examples in initial linkcheckerrc
2022-09-22 19:24:55 +01:00
Chris Mayo
4962a302b3
Document default frequency of sitemap logger
2022-09-22 19:24:55 +01:00
Chris Mayo
b8d0928969
Document dialect option of csv logger
2022-09-22 19:24:55 +01:00
Chris Mayo
130347f223
Remove unused WARN_IGNORE_URL
...
URL ignored was changed to an info message in:
7b34be59 ("Introduce check plugins, use Python requests for http/s
connections, and some code cleanups and improvements.", 2014-03-01)
2022-09-22 19:24:55 +01:00
Chris Mayo
36a45b0f96
Merge pull request #666 from cjmayo/gemini
...
Add gemini scheme
2022-09-22 19:23:20 +01:00
Chris Mayo
61792cb879
Merge pull request #667 from cjmayo/resultcachesize
...
Fixed a bug where the resultcachesize setting was ignored.
2022-09-22 19:23:03 +01:00
Chris Mayo
0c59cd5c1e
Don't use default values in configuration tests
2022-09-20 19:36:42 +01:00
Nathan Arthur
6dc5ade29d
Fixed a bug where the resultcachesize setting was ignored.
2022-09-20 19:36:23 +01:00
Chris Mayo
0d36b69536
Merge pull request #650 from cjmayo/metadata
...
Write all metadata used to _release.py
2022-09-20 19:24:15 +01:00
Chris Mayo
ed8e17137c
Add gemini scheme
2022-09-16 19:21:32 +01:00
Chris Mayo
25ce4b854c
Update IANA schemes
2022-09-16 19:21:32 +01:00
Chris Mayo
29807ed832
Add yamllint to make check
2022-09-13 19:32:06 +01:00
Chris Mayo
16fa5beda8
Install create.sql to examples
...
Creates database table for SQL logger output.
Fixes Ubuntu bug:
https://bugs.launchpad.net/ubuntu/+source/linkchecker/+bug/323123
2022-09-13 19:32:06 +01:00
Chris Mayo
38dea6b7f4
Fix install with pip git+https
...
pip only uses the wheel target.
Save building the metadata twice as a result in GitHub workflows and
update documentation.md.
2022-09-13 19:32:06 +01:00
Chris Mayo
b6a7f2d313
Don't need hatch to build documentation
...
This is a partial revert of:
47d1015e ("Replace setuptools and setup.py with hatch and pyproject.toml", 2022-09-05)
Also hatch is an option to run tests.
2022-09-13 19:32:06 +01:00
Chris Mayo
af265f3d52
Write all metadata used to _release.py
...
Enables running without installing.
Removes use of importlib.metadata.
2022-09-13 19:32:06 +01:00
Chris Mayo
30e8cfad77
Merge pull request #651 from cjmayo/rate
...
Rename url-rate-limited to http-rate-limited
2022-09-12 19:25:52 +01:00
Chris Mayo
84443b14cf
Merge pull request #647 from cjmayo/devdocs
...
Bring Developer Documentation and Tools up to date
2022-09-12 19:25:10 +01:00
Chris Mayo
fd6f4a0160
Merge pull request #657 from pacenathan/MacTimingTest
...
Make timing test more tolerant
2022-09-10 17:24:56 +01:00
Chris Mayo
d60b52658c
Merge pull request #654 from stefanfisk/issue/631-minimal
...
Fix srcset parsing
2022-09-10 17:13:03 +01:00
Nathan Arthur
47a83cbb27
Make timing test more tolerant
...
On my M1 Mac, this was taking 1.01 seconds rather than the expected 1.00
seconds. This is OK, because sleep() is not guaranteed to be precise.
2022-09-08 09:15:29 -04:00
Stefan Fisk
d2b9723612
Fix srcset parsing
...
Resolves #631
2022-09-07 21:24:23 +02:00
Chris Mayo
de40321b57
Merge pull request #652 from cjmayo/issue_xdg
...
Update path to linkcheckerrc in ISSUE_TEMPLATE.md
2022-09-06 19:42:42 +01:00
Chris Mayo
1b9b276d3a
Update path to linkcheckerrc in ISSUE_TEMPLATE.md
...
Changed in:
a03e2e4a ("use xdg dirs for config & data", 2017-10-17)
2022-09-06 19:34:53 +01:00
Chris Mayo
a0b28cc0ff
Rename url-rate-limited to http-rate-limited
...
Make consistent with the other warnings:
- The first part of the name represents the checker class in which the
warning is raised
- Update initial comment
2022-09-06 19:32:24 +01:00
Chris Mayo
579c767d28
.gitignore cannot be excluded from sdist
2022-09-06 19:23:52 +01:00
Chris Mayo
595ce32e55
Merge pull request #646 from cjmayo/unidir
...
Fix checking directory containing Unicode filenames
2022-09-06 19:23:07 +01:00