linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-03-16 22:10:26 +00:00

Author	SHA1	Message	Date
Chris Mayo	e6da68b7f6	Add linting with Pylint to build workflow	2023-05-03 19:24:53 +01:00
Chris Mayo	8065c75c4e	Convert some printf-style strings	2022-11-08 19:21:29 +00:00
Chris Mayo	b6bc366af0	Run pyupgrade --py37-plus x 2	2022-11-08 19:21:29 +00:00
Chris Mayo	0bb1576887	Run pyupgrade --py37-plus --keep-percent-format	2022-11-08 19:21:29 +00:00
Chris Mayo	eab2fa410e	Log robots.txt as the sitemap parent URL This is the location the sitemap URL was found in. The line being reported is the line in robots.txt.	2022-10-17 19:21:03 +01:00
Nathan Arthur	a29750c57f	Fix anchor comments in UrlBase Parent url query not stripped since: `4a0c63aa` ("Fix joining of URLs when parent URL has CGI parameter.", 2011-02-08)	2022-10-03 19:33:05 +01:00
Chris Mayo	52b9881820	Separate URL encoding and content encoding Ensure users of url_data.encoding are using the URL encoding. Combined since: `5fc01455` ("Decode content when retrieved, use bs4 to detect encoding if non-Unicode", 2019-09-30)	2022-09-29 19:21:11 +01:00
Lukas Pirl	8c959589c3	add option to ignore specific errors for specific URLs	2022-09-25 22:52:04 +02:00
Chris Mayo	c79bc07cee	Add MIME type application/vnd.adobe.flash.movie	2022-09-02 19:29:11 +01:00
Chris Mayo	d6936ceb91	Add warning url-content-type-unparseable	2022-09-02 19:29:11 +01:00
Kian-Meng Ang	a70ea9ea14	Fix typos Found via `codespell ./linkcheck/ ./tests ./doc/man/en -L bu,noone,fo,pres,shttp`	2022-09-02 17:20:02 +08:00
Chris Mayo	43507cf80a	Make partial and example URLs in docstrings italic Prevent Sphinx from turning them into broken links.	2021-08-12 19:28:50 +01:00
Chris Mayo	314ec085a3	Merge pull request #462 from cjmayo/anchor Fix anchor checking	2020-09-01 19:39:29 +01:00
Chris Mayo	8779c39735	Replace deprecated urllib.parse.split functions	2020-08-22 16:28:53 +01:00
Chris Mayo	0912e8a2c1	Don't strip the URL fragment from cache key if using AnchorCheck Else once one URL for a page has been checked, URLs with different fragments are skipped and not passed to AnchorCheck. `eaa538c` ("don't check one url multiple times", 2016-11-09)	2020-07-27 19:25:30 +01:00
Chris Mayo	dee21ee9a0	Fix formatting and typos in docstrings	2020-07-25 16:35:48 +01:00
Chris Mayo	b328520f08	Convert UrlBase syntax Exception to a string Causes an exception when logging.	2020-07-07 17:25:28 +01:00
Chris Mayo	3fcee872b6	urlparts need to support assignment	2020-07-07 17:25:28 +01:00
Chris Mayo	d91a328224	Remove strformat.unicode_safe() and strformat.url_unicode_split() All strings support Unicode in Python 3.	2020-07-07 17:25:28 +01:00
Chris Mayo	b974ec3262	Review comments on black linkcheck/	2020-06-01 16:07:21 +01:00
Chris Mayo	ac0967e251	Fix remaining flake8 violations in linkcheck/ linkcheck/better_exchook2.py:28:89: E501 line too long (90 > 88 characters) linkcheck/better_exchook2.py:155:9: E722 do not use bare 'except' linkcheck/better_exchook2.py:166:9: E722 do not use bare 'except' linkcheck/better_exchook2.py:289:13: E741 ambiguous variable name 'l' linkcheck/better_exchook2.py:299:9: E722 do not use bare 'except' linkcheck/containers.py:48:13: E731 do not assign a lambda expression, use a def linkcheck/ftpparse.py:123:89: E501 line too long (93 > 88 characters) linkcheck/loader.py:46:47: E203 whitespace before ':' linkcheck/logconf.py:45:29: E231 missing whitespace after ',' linkcheck/robotparser2.py:157:89: E501 line too long (95 > 88 characters) linkcheck/robotparser2.py:182:89: E501 line too long (89 > 88 characters) linkcheck/strformat.py:181:16: E203 whitespace before ':' linkcheck/strformat.py:181:43: E203 whitespace before ':' linkcheck/strformat.py:253:9: E731 do not assign a lambda expression, use a def linkcheck/strformat.py:254:9: E731 do not assign a lambda expression, use a def linkcheck/strformat.py:341:89: E501 line too long (111 > 88 characters) linkcheck/url.py:102:32: E203 whitespace before ':' linkcheck/url.py:277:5: E741 ambiguous variable name 'l' linkcheck/url.py:402:5: E741 ambiguous variable name 'l' linkcheck/checker/__init__.py:203:1: E402 module level import not at top of file linkcheck/checker/fileurl.py:200:89: E501 line too long (103 > 88 characters) linkcheck/checker/mailtourl.py:122:60: E203 whitespace before ':' linkcheck/checker/mailtourl.py:157:89: E501 line too long (96 > 88 characters) linkcheck/checker/mailtourl.py:190:89: E501 line too long (109 > 88 characters) linkcheck/checker/mailtourl.py:200:89: E501 line too long (111 > 88 characters) linkcheck/checker/mailtourl.py:249:89: E501 line too long (106 > 88 characters) linkcheck/checker/unknownurl.py:226:23: W291 trailing whitespace linkcheck/checker/urlbase.py:245:89: E501 line too long (101 > 88 characters) linkcheck/configuration/confparse.py:236:89: E501 line too long (186 > 88 characters) linkcheck/configuration/confparse.py:247:89: E501 line too long (111 > 88 characters) linkcheck/configuration/__init__.py:164:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:184:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:190:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:195:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:198:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:435:89: E501 line too long (90 > 88 characters) linkcheck/director/aggregator.py:45:43: E231 missing whitespace after ',' linkcheck/director/aggregator.py:178:89: E501 line too long (106 > 88 characters) linkcheck/logger/__init__.py:29:1: E731 do not assign a lambda expression, use a def linkcheck/logger/__init__.py:108:13: E741 ambiguous variable name 'l' linkcheck/logger/__init__.py:275:19: F821 undefined name '_' linkcheck/logger/__init__.py:342:16: F821 undefined name '_' linkcheck/logger/__init__.py:380:13: F821 undefined name '_' linkcheck/logger/__init__.py:384:13: F821 undefined name '_' linkcheck/logger/__init__.py:387:13: F821 undefined name '_' linkcheck/logger/__init__.py:396:13: F821 undefined name '_' linkcheck/network/__init__.py:1:1: W391 blank line at end of file linkcheck/plugins/locationinfo.py:89:9: E731 do not assign a lambda expression, use a def linkcheck/plugins/locationinfo.py:91:9: E731 do not assign a lambda expression, use a def linkcheck/plugins/markdowncheck.py:112:89: E501 line too long (111 > 88 characters) linkcheck/plugins/markdowncheck.py:141:9: E741 ambiguous variable name 'l' linkcheck/plugins/markdowncheck.py:165:23: E203 whitespace before ':' linkcheck/plugins/viruscheck.py:95:42: E203 whitespace before ':'	2020-05-30 17:01:36 +01:00
Chris Mayo	8dc2f12b94	Address space-separated strings in linkcheck/	2020-05-30 17:01:36 +01:00
Chris Mayo	a92a684ac4	Run black on linkcheck/	2020-05-30 17:01:36 +01:00
Chris Mayo	03b1c4919d	Record encoding in debug log messages	2020-05-23 20:01:24 +01:00
Chris Mayo	f7337f55e8	Fix error due to an empty html file accessed over http Use the already fixed [1] UrlBase.get_content() in HttpUrl. [1] `5bd1fb4` ("Fix internal error on empty HTML files", 2020-05-21)	2020-05-23 20:01:24 +01:00
Marius Gedminas	c60d7c66e4	Clarify the decision to fall back to Latin-1	2020-05-21 19:35:39 +03:00
Marius Gedminas	5bd1fb4e36	Fix internal error on empty HTML files When BeautifulSoup finds an empty file on disk, it sets original_encoding to None. It doesn't matter what encoding we pick for empty files, so let's just pick one. I don't know if there are any circumstances where BeautifulSoup might set the encoding to None for a non-empty file. Closes #392.	2020-05-21 19:01:33 +03:00
Chris Mayo	6bddd4ac60	Remove str_text from checker/	2020-05-19 19:56:42 +01:00
Chris Mayo	a127902607	Replace str_text in asserts	2020-05-19 19:56:42 +01:00
Chris Mayo	a15a2833ca	Remove spaces after names in class method definitions And also nested functions. This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	1663e10fe7	Remove spaces after names in function definitions This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	42de609f8e	Make urllib imports Python 3 only	2020-05-14 20:15:28 +01:00
Chris Mayo	736c893707	Merge pull request #377 from cjmayo/tidyten3 Remove u string prefixes	2020-05-13 19:36:54 +01:00
Chris Mayo	44e81d27dd	Remove inheriting object All Python 3 classes are new-style.	2020-05-08 10:45:31 +01:00
Chris Mayo	b0ea72e8c1	Remove # -*- coding: lines Except for tests that include non-unicode characters: tests/test_po.py tests/test_strformat.py tests/test_url.py tests/checker/test_error.py tests/checker/test_news.py	2020-05-08 10:45:31 +01:00
Chris Mayo	4d3e5abcfa	Remove u string prefixes	2020-04-30 20:11:59 +01:00
anarcat	183d483074	Merge pull request #365 from cjmayo/tidyten1 Remove use of the future package	2020-04-26 12:02:30 -04:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
Chris Mayo	f5e7f3a382	Remove use of the future package It was providing Python 2 compatibility.	2020-04-15 19:49:16 +01:00
Chris Mayo	40f43ae41c	Create one function to make soup objects	2020-04-08 20:03:35 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	5b66964afa	Remove unused .charset from checker classes Unused since: `4f8c2954` ("Don't set parser.encoding", 2019-10-05)	2020-03-30 19:32:30 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	153e53ba03	Reuse soup object used for detecting encoding in the HTML parser	2019-10-05 19:38:57 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Chris Mayo	5fc01455b7	Decode content when retrieved, use bs4 to detect encoding if non-Unicode UrlBase has been modified as follows: - the "data" variable now holds bytes - decoded content is stored in a new variable "text" - functionality from get_content() has been split out into get_raw_content() which returns "data" and download_content() which calls read_content() and sets the download related variables. This allows for subclasses to do their own decoding and parsers to use bytes.	2019-09-30 19:46:24 +01:00
Petr Dlouhý	c2af88ad2e	Python3: fix for test_telnet in urlbase.py	2019-09-15 19:49:26 +01:00
Petr Dlouhý	e10f25b968	fixes for Python 3: fix running problems in Python 3	2019-09-10 19:30:09 +01:00
Petr Dlouhý	e92b0a9f7b	Python3: fix unicode in urlbase	2019-04-25 19:57:45 +01:00
Petr Dlouhý	b3881ce3b5	Python3: fix urlbase, strformat and others	2019-04-25 19:57:45 +01:00

1 2 3 4 5 ...

329 commits