Commit graph

6930 commits

Author SHA1 Message Date
Chris Mayo
b66ca30e84
Merge pull request #680 from cjmayo/misc
Collection of independent small improvements
2022-10-24 19:26:13 +01:00
Chris Mayo
e32c76aa5c Make text logger outro "checked" translatable 2022-10-18 19:24:08 +01:00
Chris Mayo
9631c314dd Use \d in regexp in TestDecorators.test_timeit2() 2022-10-18 19:24:08 +01:00
Chris Mayo
deac09d2c1 Clarify note in TestConfig 2022-10-18 19:24:08 +01:00
Chris Mayo
ef2d571761 Support building wheel from sdist
Build hook is also called for the wheel since:
38dea6b7 ("Fix install with pip git+https", 2022-09-13)
2022-10-18 19:24:08 +01:00
Chris Mayo
a0eb6d5187 Align documentation of debug in man pages
Linked to:
b3967f75 ("Correct documentation of --debug in linkchecker(1)", 2022-09-30)
2022-10-18 19:24:08 +01:00
Chris Mayo
0f36153f69
Merge pull request #679 from cjmayo/pytest
Fix tests failing when run with pytest
2022-10-18 19:22:09 +01:00
Chris Mayo
78536c578a Fix tests failing when run with pytest
TypeError: 'NoneType' object is not callable

As per:
2cbff492 ("Fix http tests failing with pytest due to missing _()", 2022-10-03)
2022-10-17 19:26:53 +01:00
Chris Mayo
b6eea83f63
Merge pull request #676 from cjmayo/robotmap
Document sitemaps in linkchecker(1)
2022-10-17 19:25:57 +01:00
Chris Mayo
96c3336013
Merge pull request #677 from cjmayo/maxrate
Enable average HTTP request rate to be above 4 per second
2022-10-17 19:24:49 +01:00
Chris Mayo
afccdb9608
Merge pull request #675 from cjmayo/mx
Replace deprecated dns.resolver.query()
2022-10-17 19:23:33 +01:00
Chris Mayo
93f1d3f4ac Document sitemaps in linkchecker(1) 2022-10-17 19:21:03 +01:00
Chris Mayo
689557d9af Add logging of MIME types and improve docstrings 2022-10-17 19:21:03 +01:00
Chris Mayo
eab2fa410e Log robots.txt as the sitemap parent URL
This is the location the sitemap URL was found in. The line being
reported is the line in robots.txt.
2022-10-17 19:21:03 +01:00
Chris Mayo
7367e6e865 Skip incomplete Sitemap in robots.txt and warn
Sitemap values should be fully qualified URLs; LinkChecker may not
resolve relative paths correctly.
2022-10-17 19:21:03 +01:00
Chris Mayo
8bc849dfde Make --cookiefile description in linkchecker(1) a bit clearer 2022-10-17 19:21:03 +01:00
Chris Mayo
0c5db040c8 Support maxrequestspersecond less than one 2022-10-05 19:28:01 +01:00
Chris Mayo
e88cf49c8f Enable average HTTP request rate to be above 4 per second 2022-10-05 19:28:01 +01:00
Chris Mayo
f2be98b8ad Replace deprecated dns.resolver.query()
Missed in:
26c15c5e ("Fix deprecation warning for resolver.query()", 2020-09-14)
2022-10-05 19:27:13 +01:00
Chris Mayo
bbb8096df5 Add @need_network to test_no_error() in test_ignoreerrors.py
Needs network access for DNS:

warning No MX mail host for example.com found.
2022-10-05 19:27:13 +01:00
Chris Mayo
354ea933ca
Merge pull request #673 from cjmayo/sitemap
Fix sitemap output with multiple threads
2022-10-05 19:20:40 +01:00
Chris Mayo
d9265bb71c
Merge pull request #669 from cjmayo/anchorcheck
Re-enable AnchorCheck plugin
2022-10-03 19:36:08 +01:00
Nathan Arthur
2d1bf6ef98 Add tests for encoded anchors for file: and http:
I started with a test of urlencoded anchors, assuming at the URL might
have a urlencoded anchor, but the actual anchor in the HTML would NOT be
urlencoded.
2022-10-03 19:33:05 +01:00
Nathan Arthur
33036803b0 Fix a difference in anchor quoting between http and file
"I added a test for file:// processing, and it was showing different
results for when the URL anchor was and wasn't quoted. I tracked it down
to code in fileurl.py that was calling url_norm, and I'm pretty sure the
code is unnecessary at this point. But I made a minimally-invasive
change, to be as safe as possible."

UrlBase.build_url() in line 174 also calls url_norm()
2022-10-03 19:33:05 +01:00
Nathan Arthur
4cdaa59fcc Fix AnchorCheck mismatching encoded anchors
Problem identified by Christian Kirchhof.
2022-10-03 19:33:05 +01:00
Nathan Arthur
6499b7b233 Fix a major thread-safety bug in AnchorCheck
The threading issue has been there for years, but I didn't notice it
until after I thought I was done, while I was doing manual testing
(with threads re-enabled).

The problem was with storing URL-specific state (.anchors) on the
AnchorCheck object itself, because there's only one global AnchorCheck
object, so all the threads are competing to use that one simgle variable
(self.anchors).

The solution was to create a new object to hold .anchors, for each
processed URL.
2022-10-03 19:33:05 +01:00
Nathan Arthur
5398fd2406 Add an anchor test for multiple inter-connected files 2022-10-03 19:33:05 +01:00
Nathan Arthur
c221afdab5 Enable AnchorCheck to be used with local files
[I] discovered that fileurl.py was stripping the anchors from url_data,
which breaks AnchorCheck. So I stopped it from doing that, and
tried to fix up all the places that were assuming the url would map to a
filesystem file. The tests all pass, but I'm not 100% sure I caught all
the cases, or fixed them correctly.
2022-10-03 19:33:05 +01:00
Nathan Arthur
a29750c57f Fix anchor comments in UrlBase
Parent url query not stripped since:
4a0c63aa ("Fix joining of URLs when parent URL has CGI parameter.", 2011-02-08)
2022-10-03 19:33:05 +01:00
Chris Mayo
2cbff49221 Fix http tests failing with pytest due to missing _()
TypeError: 'NoneType' object is not callable

Ensure LinkCheckTest.setUp() is called to initialise translations.
2022-10-03 19:33:05 +01:00
Chris Mayo
8b2fb86895 Remove AnchorCheck disabled note in linkcheckerrc(5)
A partial revert of:
fe6dea12 ("Update documentation for disabled plugins", 2021-11-29)
2022-10-03 19:33:05 +01:00
Chris Mayo
54bcefd7d7 Revert "Disable AnchorCheck plugin"
This reverts commit 0356524369.
2022-10-03 19:33:05 +01:00
Chris Mayo
033dcf89f9
Merge pull request #671 from cjmayo/example
Fix formatting of ignoreerrors example in linkcheckerrc(5)
2022-10-03 19:22:36 +01:00
Chris Mayo
d6d5e918dc
Merge pull request #672 from cjmayo/encoding
Separate URL encoding and content encoding
2022-10-03 19:22:03 +01:00
Chris Mayo
e6763f8516 Fix sitemap output with multiple threads
SitemapXmlLogger assumes the first result logged is for the root of the
website being mapped. Ensure results are logged before content is
checked.
2022-09-30 19:22:17 +01:00
Chris Mayo
b3967f75c4 Correct documentation of --debug in linkchecker(1)
dns logger was removed in:
e1f72490 ("Move dnspython module into third_party directory.", 2011-05-24)

Threading has not been disabled with --debug since:
eaa8a963 ("Refactor logging configuration.", 2014-05-10)
2022-09-30 19:22:17 +01:00
Chris Mayo
009f22e9b6 Remove outdated comment in TestLogger
Configuration.init_logging() removed in:
eaa8a963 ("Refactor logging configuration.", 2014-05-10)
2022-09-30 19:22:17 +01:00
Chris Mayo
52b9881820 Separate URL encoding and content encoding
Ensure users of url_data.encoding are using the URL encoding.

Combined since:
5fc01455 ("Decode content when retrieved, use bs4 to detect encoding if non-Unicode", 2019-09-30)
2022-09-29 19:21:11 +01:00
Chris Mayo
61071fc5dc
Merge pull request #668 from cjmayo/defaults
Clarify default values in initial linkcheckerrc and elsewhere
2022-09-28 19:36:44 +01:00
Chris Mayo
001212b915 Fix formatting of ignoreerrors example in linkcheckerrc(5)
Introduced in:
8c959589 ("add option to ignore specific errors for specific URLs", 2022-07-21)
2022-09-28 19:23:04 +01:00
Chris Mayo
2c3aa5ebb9
Merge pull request #629 from lpirl/ignoreerrors
add option to ignore specific errors for specific URLs
2022-09-27 19:43:57 +01:00
Lukas Pirl
8c959589c3
add option to ignore specific errors for specific URLs 2022-09-25 22:52:04 +02:00
Chris Mayo
e5168f44ea Clarify defaults and examples in initial linkcheckerrc 2022-09-22 19:24:55 +01:00
Chris Mayo
4962a302b3 Document default frequency of sitemap logger 2022-09-22 19:24:55 +01:00
Chris Mayo
b8d0928969 Document dialect option of csv logger 2022-09-22 19:24:55 +01:00
Chris Mayo
130347f223 Remove unused WARN_IGNORE_URL
URL ignored was changed to an info message in:
7b34be59 ("Introduce check plugins, use Python requests for http/s
connections, and some code cleanups and improvements.", 2014-03-01)
2022-09-22 19:24:55 +01:00
Chris Mayo
36a45b0f96
Merge pull request #666 from cjmayo/gemini
Add gemini scheme
2022-09-22 19:23:20 +01:00
Chris Mayo
61792cb879
Merge pull request #667 from cjmayo/resultcachesize
Fixed a bug where the resultcachesize setting was ignored.
2022-09-22 19:23:03 +01:00
Chris Mayo
0c59cd5c1e Don't use default values in configuration tests 2022-09-20 19:36:42 +01:00
Nathan Arthur
6dc5ade29d Fixed a bug where the resultcachesize setting was ignored. 2022-09-20 19:36:23 +01:00