linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-04-21 06:41:00 +00:00

Author	SHA1	Message	Date
nodet	28f6743778	Add ignorewarningsforurls to ignore specific warnings (#794 ) We want to allow specifying a warning to ignore for each URL. If no regex is specified for the warning to ignore, we'll ignore all warnings. The tests still pass as they are, which means that unknown values in the configuration file are simply ignored. * [#782] Add values to configuration file * [#782] Parse new configuration values * [#782] Actually ignore a warning * [#782] Confirm side cases work as expected * [#782] Add logging when deciding to ignore warnings * [#782] Documentation for ignorewarningsforurls * [#782] Update (generated) man pages * [#782] These tests pass without network, actually * [#782] Fix copy/paste error in symbol naming * [#782] The regex matches the name of the warning, not the message * [#782] Better wording * [#782] Update (generated) man pages * [#782] We match the type, not the message	2024-02-13 19:43:29 +00:00
Chris Mayo	0faccf2ab3	Merge pull request #752 from cjmayo/deprecated_modules Remove support for nntp and telnet	2023-09-04 19:22:38 +01:00
Chris Mayo	b3429c4759	Remove support for nntp and telnet Python is dropping nntplib and telnetlib.	2023-08-28 19:24:57 +01:00
Chris Mayo	4d9749c5ba	Log ignored warning messages as info	2023-08-28 19:22:24 +01:00
Chris Mayo	e6da68b7f6	Add linting with Pylint to build workflow	2023-05-03 19:24:53 +01:00
Chris Mayo	8065c75c4e	Convert some printf-style strings	2022-11-08 19:21:29 +00:00
Chris Mayo	b6bc366af0	Run pyupgrade --py37-plus x 2	2022-11-08 19:21:29 +00:00
Chris Mayo	0bb1576887	Run pyupgrade --py37-plus --keep-percent-format	2022-11-08 19:21:29 +00:00
Chris Mayo	eab2fa410e	Log robots.txt as the sitemap parent URL This is the location the sitemap URL was found in. The line being reported is the line in robots.txt.	2022-10-17 19:21:03 +01:00
Nathan Arthur	a29750c57f	Fix anchor comments in UrlBase Parent url query not stripped since: `4a0c63aa` ("Fix joining of URLs when parent URL has CGI parameter.", 2011-02-08)	2022-10-03 19:33:05 +01:00
Chris Mayo	52b9881820	Separate URL encoding and content encoding Ensure users of url_data.encoding are using the URL encoding. Combined since: `5fc01455` ("Decode content when retrieved, use bs4 to detect encoding if non-Unicode", 2019-09-30)	2022-09-29 19:21:11 +01:00
Lukas Pirl	8c959589c3	add option to ignore specific errors for specific URLs	2022-09-25 22:52:04 +02:00
Chris Mayo	c79bc07cee	Add MIME type application/vnd.adobe.flash.movie	2022-09-02 19:29:11 +01:00
Chris Mayo	d6936ceb91	Add warning url-content-type-unparseable	2022-09-02 19:29:11 +01:00
Kian-Meng Ang	a70ea9ea14	Fix typos Found via `codespell ./linkcheck/ ./tests ./doc/man/en -L bu,noone,fo,pres,shttp`	2022-09-02 17:20:02 +08:00
Chris Mayo	43507cf80a	Make partial and example URLs in docstrings italic Prevent Sphinx from turning them into broken links.	2021-08-12 19:28:50 +01:00
Chris Mayo	314ec085a3	Merge pull request #462 from cjmayo/anchor Fix anchor checking	2020-09-01 19:39:29 +01:00
Chris Mayo	8779c39735	Replace deprecated urllib.parse.split functions	2020-08-22 16:28:53 +01:00
Chris Mayo	0912e8a2c1	Don't strip the URL fragment from cache key if using AnchorCheck Else once one URL for a page has been checked, URLs with different fragments are skipped and not passed to AnchorCheck. `eaa538c` ("don't check one url multiple times", 2016-11-09)	2020-07-27 19:25:30 +01:00
Chris Mayo	dee21ee9a0	Fix formatting and typos in docstrings	2020-07-25 16:35:48 +01:00
Chris Mayo	b328520f08	Convert UrlBase syntax Exception to a string Causes an exception when logging.	2020-07-07 17:25:28 +01:00
Chris Mayo	3fcee872b6	urlparts need to support assignment	2020-07-07 17:25:28 +01:00
Chris Mayo	d91a328224	Remove strformat.unicode_safe() and strformat.url_unicode_split() All strings support Unicode in Python 3.	2020-07-07 17:25:28 +01:00
Chris Mayo	b974ec3262	Review comments on black linkcheck/	2020-06-01 16:07:21 +01:00
Chris Mayo	ac0967e251	Fix remaining flake8 violations in linkcheck/ linkcheck/better_exchook2.py:28:89: E501 line too long (90 > 88 characters) linkcheck/better_exchook2.py:155:9: E722 do not use bare 'except' linkcheck/better_exchook2.py:166:9: E722 do not use bare 'except' linkcheck/better_exchook2.py:289:13: E741 ambiguous variable name 'l' linkcheck/better_exchook2.py:299:9: E722 do not use bare 'except' linkcheck/containers.py:48:13: E731 do not assign a lambda expression, use a def linkcheck/ftpparse.py:123:89: E501 line too long (93 > 88 characters) linkcheck/loader.py:46:47: E203 whitespace before ':' linkcheck/logconf.py:45:29: E231 missing whitespace after ',' linkcheck/robotparser2.py:157:89: E501 line too long (95 > 88 characters) linkcheck/robotparser2.py:182:89: E501 line too long (89 > 88 characters) linkcheck/strformat.py:181:16: E203 whitespace before ':' linkcheck/strformat.py:181:43: E203 whitespace before ':' linkcheck/strformat.py:253:9: E731 do not assign a lambda expression, use a def linkcheck/strformat.py:254:9: E731 do not assign a lambda expression, use a def linkcheck/strformat.py:341:89: E501 line too long (111 > 88 characters) linkcheck/url.py:102:32: E203 whitespace before ':' linkcheck/url.py:277:5: E741 ambiguous variable name 'l' linkcheck/url.py:402:5: E741 ambiguous variable name 'l' linkcheck/checker/__init__.py:203:1: E402 module level import not at top of file linkcheck/checker/fileurl.py:200:89: E501 line too long (103 > 88 characters) linkcheck/checker/mailtourl.py:122:60: E203 whitespace before ':' linkcheck/checker/mailtourl.py:157:89: E501 line too long (96 > 88 characters) linkcheck/checker/mailtourl.py:190:89: E501 line too long (109 > 88 characters) linkcheck/checker/mailtourl.py:200:89: E501 line too long (111 > 88 characters) linkcheck/checker/mailtourl.py:249:89: E501 line too long (106 > 88 characters) linkcheck/checker/unknownurl.py:226:23: W291 trailing whitespace linkcheck/checker/urlbase.py:245:89: E501 line too long (101 > 88 characters) linkcheck/configuration/confparse.py:236:89: E501 line too long (186 > 88 characters) linkcheck/configuration/confparse.py:247:89: E501 line too long (111 > 88 characters) linkcheck/configuration/__init__.py:164:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:184:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:190:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:195:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:198:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:435:89: E501 line too long (90 > 88 characters) linkcheck/director/aggregator.py:45:43: E231 missing whitespace after ',' linkcheck/director/aggregator.py:178:89: E501 line too long (106 > 88 characters) linkcheck/logger/__init__.py:29:1: E731 do not assign a lambda expression, use a def linkcheck/logger/__init__.py:108:13: E741 ambiguous variable name 'l' linkcheck/logger/__init__.py:275:19: F821 undefined name '_' linkcheck/logger/__init__.py:342:16: F821 undefined name '_' linkcheck/logger/__init__.py:380:13: F821 undefined name '_' linkcheck/logger/__init__.py:384:13: F821 undefined name '_' linkcheck/logger/__init__.py:387:13: F821 undefined name '_' linkcheck/logger/__init__.py:396:13: F821 undefined name '_' linkcheck/network/__init__.py:1:1: W391 blank line at end of file linkcheck/plugins/locationinfo.py:89:9: E731 do not assign a lambda expression, use a def linkcheck/plugins/locationinfo.py:91:9: E731 do not assign a lambda expression, use a def linkcheck/plugins/markdowncheck.py:112:89: E501 line too long (111 > 88 characters) linkcheck/plugins/markdowncheck.py:141:9: E741 ambiguous variable name 'l' linkcheck/plugins/markdowncheck.py:165:23: E203 whitespace before ':' linkcheck/plugins/viruscheck.py:95:42: E203 whitespace before ':'	2020-05-30 17:01:36 +01:00
Chris Mayo	8dc2f12b94	Address space-separated strings in linkcheck/	2020-05-30 17:01:36 +01:00
Chris Mayo	a92a684ac4	Run black on linkcheck/	2020-05-30 17:01:36 +01:00
Chris Mayo	03b1c4919d	Record encoding in debug log messages	2020-05-23 20:01:24 +01:00
Chris Mayo	f7337f55e8	Fix error due to an empty html file accessed over http Use the already fixed [1] UrlBase.get_content() in HttpUrl. [1] `5bd1fb4` ("Fix internal error on empty HTML files", 2020-05-21)	2020-05-23 20:01:24 +01:00
Marius Gedminas	c60d7c66e4	Clarify the decision to fall back to Latin-1	2020-05-21 19:35:39 +03:00
Marius Gedminas	5bd1fb4e36	Fix internal error on empty HTML files When BeautifulSoup finds an empty file on disk, it sets original_encoding to None. It doesn't matter what encoding we pick for empty files, so let's just pick one. I don't know if there are any circumstances where BeautifulSoup might set the encoding to None for a non-empty file. Closes #392.	2020-05-21 19:01:33 +03:00
Chris Mayo	6bddd4ac60	Remove str_text from checker/	2020-05-19 19:56:42 +01:00
Chris Mayo	a127902607	Replace str_text in asserts	2020-05-19 19:56:42 +01:00
Chris Mayo	a15a2833ca	Remove spaces after names in class method definitions And also nested functions. This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	1663e10fe7	Remove spaces after names in function definitions This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	42de609f8e	Make urllib imports Python 3 only	2020-05-14 20:15:28 +01:00
Chris Mayo	736c893707	Merge pull request #377 from cjmayo/tidyten3 Remove u string prefixes	2020-05-13 19:36:54 +01:00
Chris Mayo	44e81d27dd	Remove inheriting object All Python 3 classes are new-style.	2020-05-08 10:45:31 +01:00
Chris Mayo	b0ea72e8c1	Remove # -*- coding: lines Except for tests that include non-unicode characters: tests/test_po.py tests/test_strformat.py tests/test_url.py tests/checker/test_error.py tests/checker/test_news.py	2020-05-08 10:45:31 +01:00
Chris Mayo	4d3e5abcfa	Remove u string prefixes	2020-04-30 20:11:59 +01:00
anarcat	183d483074	Merge pull request #365 from cjmayo/tidyten1 Remove use of the future package	2020-04-26 12:02:30 -04:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
Chris Mayo	f5e7f3a382	Remove use of the future package It was providing Python 2 compatibility.	2020-04-15 19:49:16 +01:00
Chris Mayo	40f43ae41c	Create one function to make soup objects	2020-04-08 20:03:35 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	5b66964afa	Remove unused .charset from checker classes Unused since: `4f8c2954` ("Don't set parser.encoding", 2019-10-05)	2020-03-30 19:32:30 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	153e53ba03	Reuse soup object used for detecting encoding in the HTML parser	2019-10-05 19:38:57 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Chris Mayo	5fc01455b7	Decode content when retrieved, use bs4 to detect encoding if non-Unicode UrlBase has been modified as follows: - the "data" variable now holds bytes - decoded content is stored in a new variable "text" - functionality from get_content() has been split out into get_raw_content() which returns "data" and download_content() which calls read_content() and sets the download related variables. This allows for subclasses to do their own decoding and parsers to use bytes.	2019-09-30 19:46:24 +01:00

1 2 3 4 5 ...

333 commits