linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-02 20:04:43 +00:00

Author	SHA1	Message	Date
Chris Mayo	f2be98b8ad	Replace deprecated dns.resolver.query() Missed in: `26c15c5e` ("Fix deprecation warning for resolver.query()", 2020-09-14)	2022-10-05 19:27:13 +01:00
Nathan Arthur	33036803b0	Fix a difference in anchor quoting between http and file "I added a test for file:// processing, and it was showing different results for when the URL anchor was and wasn't quoted. I tracked it down to code in fileurl.py that was calling url_norm, and I'm pretty sure the code is unnecessary at this point. But I made a minimally-invasive change, to be as safe as possible." UrlBase.build_url() in line 174 also calls url_norm()	2022-10-03 19:33:05 +01:00
Nathan Arthur	c221afdab5	Enable AnchorCheck to be used with local files [I] discovered that fileurl.py was stripping the anchors from url_data, which breaks AnchorCheck. So I stopped it from doing that, and tried to fix up all the places that were assuming the url would map to a filesystem file. The tests all pass, but I'm not 100% sure I caught all the cases, or fixed them correctly.	2022-10-03 19:33:05 +01:00
Nathan Arthur	a29750c57f	Fix anchor comments in UrlBase Parent url query not stripped since: `4a0c63aa` ("Fix joining of URLs when parent URL has CGI parameter.", 2011-02-08)	2022-10-03 19:33:05 +01:00
Chris Mayo	52b9881820	Separate URL encoding and content encoding Ensure users of url_data.encoding are using the URL encoding. Combined since: `5fc01455` ("Decode content when retrieved, use bs4 to detect encoding if non-Unicode", 2019-09-30)	2022-09-29 19:21:11 +01:00
Chris Mayo	61071fc5dc	Merge pull request #668 from cjmayo/defaults Clarify default values in initial linkcheckerrc and elsewhere	2022-09-28 19:36:44 +01:00
Lukas Pirl	8c959589c3	add option to ignore specific errors for specific URLs	2022-09-25 22:52:04 +02:00
Chris Mayo	130347f223	Remove unused WARN_IGNORE_URL URL ignored was changed to an info message in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2022-09-22 19:24:55 +01:00
Chris Mayo	ed8e17137c	Add gemini scheme	2022-09-16 19:21:32 +01:00
Chris Mayo	25ce4b854c	Update IANA schemes	2022-09-16 19:21:32 +01:00
Chris Mayo	a0b28cc0ff	Rename url-rate-limited to http-rate-limited Make consistent with the other warnings: - The first part of the name represents the checker class in which the warning is raised - Update initial comment	2022-09-06 19:32:24 +01:00
Chris Mayo	3c7fb5b571	Fix checking directory containing Unicode filenames Non-Unicode filenames are not supported. sys.platform has not returned "linux2" since Python 3.3.	2022-09-05 19:28:40 +01:00
Chris Mayo	c79bc07cee	Add MIME type application/vnd.adobe.flash.movie	2022-09-02 19:29:11 +01:00
Chris Mayo	d6936ceb91	Add warning url-content-type-unparseable	2022-09-02 19:29:11 +01:00
Kian-Meng Ang	a70ea9ea14	Fix typos Found via `codespell ./linkcheck/ ./tests ./doc/man/en -L bu,noone,fo,pres,shttp`	2022-09-02 17:20:02 +08:00
Malte Gerth	cc48a09308	Add Telegram and WhatsApp link schemes	2022-02-06 23:41:33 +01:00
Malte Gerth	067dd8edbb	Update IANA schemes	2022-02-06 23:40:36 +01:00
Chris Mayo	4444a87eb9	Update Requests bug link	2021-12-15 19:34:24 +00:00
Chris Mayo	76815bcf47	Don't guess the URL for files that end in .html Fixes: linkchecker ftp.html failing looking for ftp://ftp.html	2021-12-13 19:31:13 +00:00
Chris Mayo	fe5a34c68f	Remove linkcheck.checker.proxysupport Set up the requests.Session() with the complete proxy configuration to fix a problem with using an HTTP server as an HTTPS proxy and potential redirection issues. Requests handles no_proxy.	2021-12-13 19:25:23 +00:00
Chris Mayo	a60648e348	Remove support for ftp_proxy Was limited to HTTP proxy servers and prevents simplifying and fixing HTTP proxy support.	2021-12-13 19:25:23 +00:00
Chris Mayo	f2e5a435e3	Remove unused ProxySupport.proxyauth Not used since: `7b34be590` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2021-12-13 19:25:23 +00:00
Chris Mayo	a04214465a	Update HttpUrl.encoding after following redirects	2021-12-06 19:34:31 +00:00
Chris Mayo	0325ecd73f	Remove httpurl.HEADER_ENCODING Unused since: `d91a32822` ("Remove strformat.unicode_safe() and strformat.url_unicode_split()", 2020-07-07)	2021-12-06 19:34:31 +00:00
Chris Mayo	c89c617a58	Ignore an encoding of ISO-8859-1 returned by Requests ISO-8859-1 is a fallback for Requests and causes us to mangle UTF-8 content. Requests' utils.py: def get_encoding_from_headers(headers): """Returns encodings from given HTTP Header Dict. :param headers: dictionary to extract encoding from. :rtype: str """ content_type = headers.get('content-type') if not content_type: return None content_type, params = _parse_content_type_header(content_type) if 'charset' in params: return params['charset'].strip("'\"") if 'text' in content_type: return 'ISO-8859-1' if 'application/json' in content_type: # Assume UTF-8 based on RFC 4627: https://www.ietf.org/rfc/rfc4627.txt since the charset was unset return 'utf-8'	2021-11-29 19:52:37 +00:00
Chris Mayo	43507cf80a	Make partial and example URLs in docstrings italic Prevent Sphinx from turning them into broken links.	2021-08-12 19:28:50 +01:00
Chris Mayo	26c15c5e67	Fix deprecation warning for resolver.query() /home/travis/build/linkchecker/linkchecker/linkcheck/checker/mailtourl.py:321: DeprecationWarning: please use dns.resolver.resolve() instead answers = resolver.query(domain, 'MX')	2020-09-14 19:55:05 +01:00
Chris Mayo	b1faef93c3	Merge pull request #495 from cjmayo/mswindows MS Windows Python 3.7 and MS Store compatibility	2020-09-01 19:46:44 +01:00
Chris Mayo	314ec085a3	Merge pull request #462 from cjmayo/anchor Fix anchor checking	2020-09-01 19:39:29 +01:00
Chris Mayo	2fbd49dd0b	Replace os.path.splitunc() with os.path.splitdrive() os.path.splitunc() removed in Python 3.7. https://docs.python.org/3/whatsnew/3.7.html#api-and-feature-removals	2020-08-29 16:57:57 +01:00
Chris Mayo	37e4981089	Merge pull request #492 from cjmayo/pass Assorted tidying included unneeded pass statements	2020-08-29 16:55:39 +01:00
Chris Mayo	1f58419322	Remove unneeded pass statements	2020-08-22 17:17:02 +01:00
Chris Mayo	8779c39735	Replace deprecated urllib.parse.split functions	2020-08-22 16:28:53 +01:00
Chris Mayo	1b497389b5	Merge pull request #483 from cjmayo/retryafter Don't translate "Retry-After" server header field	2020-08-21 16:51:17 +01:00
Chris Mayo	5d83e93829	Merge pull request #475 from cjmayo/iana Update IANA scripts and ignored schemes	2020-08-18 19:40:35 +01:00
Chris Mayo	0269fd88b0	Merge pull request #473 from cjmayo/valueerror Fix critical exception when parsing a URL with a ]	2020-08-15 16:51:17 +01:00
Chris Mayo	7ee151ebbf	Don't translate "Retry-After" server header field It is defined in RFC 7231.	2020-08-14 19:29:19 +01:00
Chris Mayo	80763ed1ea	Add slack to the list of ignored schemes slack:// is a way to interact with a local Slack client [1], and is not something that LinkChecker can check. [1] https://api.slack.com/reference/deep-linking#client	2020-08-09 17:10:26 +01:00
Chris Mayo	f19fd4f5bc	Update IANA scripts and ignored schemes (2020-07-28)	2020-08-09 17:10:26 +01:00
Chris Mayo	d5690203fc	Fix critical exception when parsing a URL with a ] e.g.: <a href="http://localhost]">square</a> Causes urllib to raise a ValueError: File "/usr/lib/python3.8/site-packages/linkcheck/url.py", line 315, in url_norm line: urlparts = list(urllib.parse.urlsplit(url)) locals: urlparts = <not found> list = <builtin> <class 'list'> urllib = <global> <module 'urllib' from '/usr/lib/python3.8/urllib/__init__.py'> urllib.parse = <global> <module 'urllib.parse' from '/usr/lib/python3.8/urllib/parse.py'> urllib.parse.urlsplit = <global> <function urlsplit at 0x7f950e699e50> url = <local> 'http://localhost]', len = 17 File "/usr/lib/python3.8/urllib/parse.py", line 440, in urlsplit line: raise ValueError("Invalid IPv6 URL") locals: ValueError = <builtin> <class 'ValueError'>	2020-08-08 16:47:31 +01:00
Chris Mayo	0912e8a2c1	Don't strip the URL fragment from cache key if using AnchorCheck Else once one URL for a page has been checked, URLs with different fragments are skipped and not passed to AnchorCheck. `eaa538c` ("don't check one url multiple times", 2016-11-09)	2020-07-27 19:25:30 +01:00
Chris Mayo	dee21ee9a0	Fix formatting and typos in docstrings	2020-07-25 16:35:48 +01:00
Chris Mayo	a977e4d712	Merge pull request #444 from cjmayo/isinstance Remove or replace uses of isinstance()	2020-07-08 19:55:29 +01:00
Chris Mayo	b328520f08	Convert UrlBase syntax Exception to a string Causes an exception when logging.	2020-07-07 17:25:28 +01:00
Chris Mayo	53bd5c4d21	Remove HttpUrl.getheader()	2020-07-07 17:25:28 +01:00
Chris Mayo	3fcee872b6	urlparts need to support assignment	2020-07-07 17:25:28 +01:00
Chris Mayo	d91a328224	Remove strformat.unicode_safe() and strformat.url_unicode_split() All strings support Unicode in Python 3.	2020-07-07 17:25:28 +01:00
Chris Mayo	f86e506de4	Remove isinstance() from FileUrl.read_content() get_index_html() returns a string.	2020-06-18 19:27:06 +01:00
Chris Mayo	36246c15ac	Update various comments to https	2020-06-05 16:59:46 +01:00
Chris Mayo	a6b1eb45b1	Convert to Python 3 super()	2020-06-03 20:06:36 +01:00

1 2 3 4 5 ...

942 commits