Commit graph

200 commits

Author SHA1 Message Date
Chris Mayo
b6bc366af0 Run pyupgrade --py37-plus x 2 2022-11-08 19:21:29 +00:00
Chris Mayo
e88cf49c8f Enable average HTTP request rate to be above 4 per second 2022-10-05 19:28:01 +01:00
Chris Mayo
e6763f8516 Fix sitemap output with multiple threads
SitemapXmlLogger assumes the first result logged is for the root of the
website being mapped. Ensure results are logged before content is
checked.
2022-09-30 19:22:17 +01:00
Kian-Meng Ang
a70ea9ea14 Fix typos
Found via `codespell ./linkcheck/ ./tests ./doc/man/en -L bu,noone,fo,pres,shttp`
2022-09-02 17:20:02 +08:00
Chris Mayo
9504a6dddf Document the curl_ca_bundle environment variable 2021-12-13 19:25:23 +00:00
Chris Mayo
a2e379a595 Remove built-in GNOME and KDE proxy support
Only http_proxy was ever supported.

Requests uses urllib.request.getproxies().

Fedora 35 and Ubuntu 20.04 do set proxy environment variables when
settings are added through the GUI.

GNOME location of proxy settings is subject to change:
https://wiki.gnome.org/Projects/NetworkManager/Proxies
https://gitlab.gnome.org/GNOME/gsettings-desktop-schemas/-/issues/27
2021-12-13 19:25:23 +00:00
Chris Mayo
fe5a34c68f Remove linkcheck.checker.proxysupport
Set up the requests.Session() with the complete proxy configuration
to fix a problem with using an HTTP server as an HTTPS proxy and
potential redirection issues.

Requests handles no_proxy.
2021-12-13 19:25:23 +00:00
Chris Mayo
35ecb7e639 Add https_proxy to internal error message 2021-12-13 19:25:23 +00:00
Chris Mayo
a60648e348 Remove support for ftp_proxy
Was limited to HTTP proxy servers and prevents simplifying and fixing
HTTP proxy support.
2021-12-13 19:25:23 +00:00
Chris Mayo
2a77e12618 Replace deprecated Thread.getName() and Condition.notifyAll() 2021-11-16 19:45:38 +00:00
Paul Haerle
f395c74aac
Make ResultCache max_size configurable (#544)
* Make ResultCache max_size configurable

fixes #463

* Add tests and docs.

* fix documentation...

...adapt the source, not the auto-generated man pages themselves as
requested in #544.

* fix typo.
2021-06-21 19:45:19 +01:00
Chris Mayo
525b6751a9 Merge pull request #468 from cjmayo/interrupter
Rename director/interrupt.py to director/interrupter.py
2020-08-15 16:31:33 +01:00
Chris Mayo
46b9e6b169 Rename director/interrupt.py to director/interrupter.py
Avoid a clash with director.interrupt() when automatically documenting.
2020-08-03 19:48:07 +01:00
Chris Mayo
dee21ee9a0 Fix formatting and typos in docstrings 2020-07-25 16:35:48 +01:00
Chris Mayo
1ec3848720 Log problem with login form without exception 2020-06-23 17:28:31 +01:00
Chris Mayo
a6b1eb45b1 Convert to Python 3 super() 2020-06-03 20:06:36 +01:00
Chris Mayo
cec9b78f5e Additional review comments on black linkcheck/ 2020-06-03 20:06:36 +01:00
Chris Mayo
b974ec3262 Review comments on black linkcheck/ 2020-06-01 16:07:21 +01:00
Chris Mayo
ac0967e251 Fix remaining flake8 violations in linkcheck/
linkcheck/better_exchook2.py:28:89: E501 line too long (90 > 88 characters)
linkcheck/better_exchook2.py:155:9: E722 do not use bare 'except'
linkcheck/better_exchook2.py:166:9: E722 do not use bare 'except'
linkcheck/better_exchook2.py:289:13: E741 ambiguous variable name 'l'
linkcheck/better_exchook2.py:299:9: E722 do not use bare 'except'
linkcheck/containers.py:48:13: E731 do not assign a lambda expression, use a def
linkcheck/ftpparse.py:123:89: E501 line too long (93 > 88 characters)
linkcheck/loader.py:46:47: E203 whitespace before ':'
linkcheck/logconf.py:45:29: E231 missing whitespace after ','
linkcheck/robotparser2.py:157:89: E501 line too long (95 > 88 characters)
linkcheck/robotparser2.py:182:89: E501 line too long (89 > 88 characters)
linkcheck/strformat.py:181:16: E203 whitespace before ':'
linkcheck/strformat.py:181:43: E203 whitespace before ':'
linkcheck/strformat.py:253:9: E731 do not assign a lambda expression, use a def
linkcheck/strformat.py:254:9: E731 do not assign a lambda expression, use a def
linkcheck/strformat.py:341:89: E501 line too long (111 > 88 characters)
linkcheck/url.py:102:32: E203 whitespace before ':'
linkcheck/url.py:277:5: E741 ambiguous variable name 'l'
linkcheck/url.py:402:5: E741 ambiguous variable name 'l'
linkcheck/checker/__init__.py:203:1: E402 module level import not at top of file
linkcheck/checker/fileurl.py:200:89: E501 line too long (103 > 88 characters)
linkcheck/checker/mailtourl.py:122:60: E203 whitespace before ':'
linkcheck/checker/mailtourl.py:157:89: E501 line too long (96 > 88 characters)
linkcheck/checker/mailtourl.py:190:89: E501 line too long (109 > 88 characters)
linkcheck/checker/mailtourl.py:200:89: E501 line too long (111 > 88 characters)
linkcheck/checker/mailtourl.py:249:89: E501 line too long (106 > 88 characters)
linkcheck/checker/unknownurl.py:226:23: W291 trailing whitespace
linkcheck/checker/urlbase.py:245:89: E501 line too long (101 > 88 characters)
linkcheck/configuration/confparse.py:236:89: E501 line too long (186 > 88 characters)
linkcheck/configuration/confparse.py:247:89: E501 line too long (111 > 88 characters)
linkcheck/configuration/__init__.py:164:9: E266 too many leading '#' for block comment
linkcheck/configuration/__init__.py:184:9: E266 too many leading '#' for block comment
linkcheck/configuration/__init__.py:190:9: E266 too many leading '#' for block comment
linkcheck/configuration/__init__.py:195:9: E266 too many leading '#' for block comment
linkcheck/configuration/__init__.py:198:9: E266 too many leading '#' for block comment
linkcheck/configuration/__init__.py:435:89: E501 line too long (90 > 88 characters)
linkcheck/director/aggregator.py:45:43: E231 missing whitespace after ','
linkcheck/director/aggregator.py:178:89: E501 line too long (106 > 88 characters)
linkcheck/logger/__init__.py:29:1: E731 do not assign a lambda expression, use a def
linkcheck/logger/__init__.py:108:13: E741 ambiguous variable name 'l'
linkcheck/logger/__init__.py:275:19: F821 undefined name '_'
linkcheck/logger/__init__.py:342:16: F821 undefined name '_'
linkcheck/logger/__init__.py:380:13: F821 undefined name '_'
linkcheck/logger/__init__.py:384:13: F821 undefined name '_'
linkcheck/logger/__init__.py:387:13: F821 undefined name '_'
linkcheck/logger/__init__.py:396:13: F821 undefined name '_'
linkcheck/network/__init__.py:1:1: W391 blank line at end of file
linkcheck/plugins/locationinfo.py:89:9: E731 do not assign a lambda expression, use a def
linkcheck/plugins/locationinfo.py:91:9: E731 do not assign a lambda expression, use a def
linkcheck/plugins/markdowncheck.py:112:89: E501 line too long (111 > 88 characters)
linkcheck/plugins/markdowncheck.py:141:9: E741 ambiguous variable name 'l'
linkcheck/plugins/markdowncheck.py:165:23: E203 whitespace before ':'
linkcheck/plugins/viruscheck.py:95:42: E203 whitespace before ':'
2020-05-30 17:01:36 +01:00
Chris Mayo
a92a684ac4 Run black on linkcheck/ 2020-05-30 17:01:36 +01:00
Marius Gedminas
b0435b3d47 Make sure login form fetching uses a timeout
Also resolve an XXX comment about the User-Agent header (which is
configured in new_request_session), but add a couple of XXX comments
about using proxy and possibly disabling TLS certificate checking.
2020-05-22 11:19:51 +03:00
Chris Mayo
6cfc8eeb49 Replace threading.Thread.setName() with setting the name property
As recommended in:

https://docs.python.org/3.5/library/threading.html#threading.Thread.setName
2020-05-20 19:58:44 +01:00
Chris Mayo
42eba19a7d No need to encode url in Checker.check_url_data()
Was causing b'' in log messages e.g. CheckThread-b'http:...
2020-05-20 19:58:44 +01:00
Chris Mayo
a15a2833ca Remove spaces after names in class method definitions
And also nested functions.

This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
1663e10fe7 Remove spaces after names in function definitions
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
fc11d08968 Remove spaces after names in class definitions 2020-05-16 20:19:42 +01:00
Chris Mayo
1e277444f4 Remove Python 2 thread import 2020-05-16 16:26:34 +01:00
Chris Mayo
42de609f8e Make urllib imports Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
08ddf658bc
Merge pull request #366 from cjmayo/userorpwd
Support login forms with user and/or password
2020-05-13 19:37:44 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
3ace021264 Support login forms with user and/or password 2020-05-13 19:32:25 +01:00
Chris Mayo
44e81d27dd Remove inheriting object
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1 Remove # -*- coding: lines
Except for tests that include non-unicode characters:

tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Chris Mayo
4d3e5abcfa Remove u string prefixes 2020-04-30 20:11:59 +01:00
Chris Mayo
a51f02cf66 Improve error handling and debugging for login form 2020-04-27 18:06:29 +01:00
Chris Mayo
9a33c2a659 Make requesting login form password work on Python 3 2020-04-27 18:06:29 +01:00
Chris Mayo
7a6ef938cc Rename htmlutil.formsearch to htmlutil.loginformsearch
Make it clear that this module has only one specific use.
2020-04-27 18:06:29 +01:00
Chris Mayo
a83fbb56c0 Remove from __future__ imports 2020-04-15 19:49:16 +01:00
Marius Gedminas
e274d74be2 Wait for threads to exit after stopping them
This fixes a race condition where the main thread would check if any
internal errors happened and get back a 0 while a worker thread was
still busy printing the internal error message before incrementing the
counter.

Fixes #320.

My experiments show that this adds no perceptible delay to the script
runtime (on Linux).  More specifically, there already is an annoying
perceptible delay of about 1 second, but it's not caused by this change.
2019-10-21 18:23:58 +03:00
anarcat
f73ba54a2a
Merge pull request #308 from cjmayo/decode
Decode content when retrieved
2019-10-10 09:46:32 -04:00
Chris Mayo
c6a06d99ac Remove unnecessary unicode() from StatusLogger.writeln() 2019-09-30 20:06:48 +01:00
Chris Mayo
9460064084 Use requests to decode the content of login form 2019-09-30 19:46:24 +01:00
Marius Gedminas
6f55f446ae Load cookies from the --cookiefile correctly
requests.cookies.merge_cookies() requires a dict or a CookieJar as the second argument.
We've been passing lists of Cookie objects instead.

Fixes #62, harder this time.
2018-03-16 13:23:26 +02:00
Marius Gedminas
6becc08284 Fix internal error when using cookies
There was some kind of confusion between a module and a function argument,
introduced in commit 90257a1b5e.

Fixes #62.
2018-03-15 23:30:41 +02:00
Petr Dlouhý
a1b300c892 Python3: fix imports 2018-01-19 09:52:43 +01:00
Bastian Kleineidam
0ef00eea56 Move GUI files to separate project 2016-01-23 13:28:15 +01:00
Bastian Kleineidam
228bce1ba2 Add to instead of replace the HTTP client headers. 2014-09-20 12:17:42 +02:00
Bastian Kleineidam
35eb30432e Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
Bastian Kleineidam
29193bbcc9 Fix login URL cookies and don't sanitize after config reading. 2014-07-15 22:23:38 +02:00
Bastian Kleineidam
90257a1b5e Replace twill with custom code. 2014-07-15 18:37:05 +02:00