linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-14 09:33:09 +00:00

Author	SHA1	Message	Date
Chris Mayo	18f20d592f	Check for KDE 5 proxy first and then KDE 4 Don't look for kde4-config in case a KDE 5 user still has it installed.	2020-07-07 17:06:25 +01:00
Chris Mayo	bd55c2ef8f	Compare KDE proxy ReversedException integer value to zero	2020-07-07 17:06:25 +01:00
Chris Mayo	da22d4886b	Merge pull request #441 from cjmayo/authentication Improve documentation of authentication	2020-06-23 17:35:19 +01:00
Chris Mayo	085ae188f7	Remove checks for empty loginpasswordfield and loginuserfield These have default values and cannot be reset.	2020-06-23 17:28:31 +01:00
Chris Mayo	1ec3848720	Log problem with login form without exception	2020-06-23 17:28:31 +01:00
Chris Mayo	2f51a9dca0	Improve documentation of authentication	2020-06-23 17:28:31 +01:00
Chris Mayo	d66e64460c	Remove unused code from strformat.py	2020-06-18 19:31:00 +01:00
Chris Mayo	1f77506c9f	Remove isinstance() in url.url_fix_mailto_urlsplit() urls are strings.	2020-06-18 19:27:06 +01:00
Chris Mayo	8f9f687ed8	Remove isinstance() from fileutil.path_safe() paths are derived from urls which are strings.	2020-06-18 19:27:06 +01:00
Chris Mayo	f86e506de4	Remove isinstance() from FileUrl.read_content() get_index_html() returns a string.	2020-06-18 19:27:06 +01:00
Chris Mayo	3231730366	Remove isinstance() from robotparser2.py Originally for encoding Python 2 Unicode strings [1]. Will not be used in Python 3 because the variables are strings, if they were bytes exceptions would be raised. [1] `c97f68f7` ("accept unicode in robots.txt can_fetch", 2004-11-09)	2020-06-18 19:27:06 +01:00
Chris Mayo	9c9a3d8b14	Remove isinstance() from url.idna_encode() Was originally used for Python 2 Unicode strings. `f4b73c6d` ("Python3: fix unicode in url.py", 2018-01-05)	2020-06-18 19:27:06 +01:00
Chris Mayo	3a6540bc46	Replace isinstance() in strformat.ascii_safe()	2020-06-18 19:27:06 +01:00
Chris Mayo	4009039158	Merge pull request #420 from cjmayo/dconf Update GNOME proxy support for GNOME 3 and Python 3	2020-06-14 18:56:19 +01:00
Chris Mayo	b6004fb6b1	Simplify and add debug messages to KDE proxy retrieval	2020-06-08 17:00:10 +01:00
Chris Mayo	29b292c90f	Replace KDE 3 proxy support with KDE 5 support KDE 3 was superseded in 2008. KDE 4 uses: ${HOME}/.kde4/share/config/kioslaverc KDE 5 (Kubuntu) uses: ${HOME}/.config/kioslaverc Default ReversedException is false	2020-06-08 17:00:10 +01:00
Chris Mayo	9108afeee5	Add html.escape on URLs in logger/html.py	2020-06-05 16:59:46 +01:00
Chris Mayo	eeb5fa48ca	Update configuration/confparse.py log message to https	2020-06-05 16:59:46 +01:00
Chris Mayo	0191b021f4	Make configuration/confparse.py log message translatable	2020-06-05 16:59:46 +01:00
Chris Mayo	36246c15ac	Update various comments to https	2020-06-05 16:59:46 +01:00
Chris Mayo	3bd790c22d	Update W3C validator links to use https	2020-06-05 16:59:46 +01:00
Chris Mayo	b987d6f3ca	Fix indent in plugins/locationinfo.py	2020-06-05 16:59:46 +01:00
Chris Mayo	4330b8a59e	Replace codecs.open() with open()	2020-06-05 16:59:46 +01:00
Chris Mayo	b9c8e33878	Update GNOME proxy support for GNOME 3 and Python 3 GConf is replaced by dconf and the GSettings API in GNOME 3.	2020-06-05 16:29:45 +01:00
Chris Mayo	e207ac54ce	Merge pull request #437 from cjmayo/translate Update man page translation and fixes for application translation process	2020-06-05 16:17:06 +01:00
Chris Mayo	1632a1ce26	Fix xgettext Non-ASCII error when translating xgettext: Non-ASCII character at ../linkcheck/plugins/markdowncheck.py:2. Please specify the source encoding through --from-code or through a comment as specified in https://www.python.org/peps/pep-0263.html. make: *** [Makefile:25: linkchecker.pot] Error 1	2020-06-05 16:06:01 +01:00
Chris Mayo	d591fedb60	Remove unused updater code that supports linkchecker-gui pip provides update support for linkchecker.	2020-06-05 16:05:25 +01:00
Chris Mayo	a6b1eb45b1	Convert to Python 3 super()	2020-06-03 20:06:36 +01:00
Chris Mayo	cec9b78f5e	Additional review comments on black linkcheck/	2020-06-03 20:06:36 +01:00
Chris Mayo	6b3cb18546	Restore better_exchook2.py and colorama.py to pre-Black state These files are based on published packages. better_exchook2.py was derived from better_exchook.py in: https://pypi.org/project/better_exchook/ colorama.py was derived from win32.py in: https://pypi.org/project/colorama/ Files modified in: `a92a684a` ("Run black on linkcheck/", 2020-05-30)	2020-06-03 20:06:36 +01:00
Chris Mayo	b974ec3262	Review comments on black linkcheck/	2020-06-01 16:07:21 +01:00
Chris Mayo	ac0967e251	Fix remaining flake8 violations in linkcheck/ linkcheck/better_exchook2.py:28:89: E501 line too long (90 > 88 characters) linkcheck/better_exchook2.py:155:9: E722 do not use bare 'except' linkcheck/better_exchook2.py:166:9: E722 do not use bare 'except' linkcheck/better_exchook2.py:289:13: E741 ambiguous variable name 'l' linkcheck/better_exchook2.py:299:9: E722 do not use bare 'except' linkcheck/containers.py:48:13: E731 do not assign a lambda expression, use a def linkcheck/ftpparse.py:123:89: E501 line too long (93 > 88 characters) linkcheck/loader.py:46:47: E203 whitespace before ':' linkcheck/logconf.py:45:29: E231 missing whitespace after ',' linkcheck/robotparser2.py:157:89: E501 line too long (95 > 88 characters) linkcheck/robotparser2.py:182:89: E501 line too long (89 > 88 characters) linkcheck/strformat.py:181:16: E203 whitespace before ':' linkcheck/strformat.py:181:43: E203 whitespace before ':' linkcheck/strformat.py:253:9: E731 do not assign a lambda expression, use a def linkcheck/strformat.py:254:9: E731 do not assign a lambda expression, use a def linkcheck/strformat.py:341:89: E501 line too long (111 > 88 characters) linkcheck/url.py:102:32: E203 whitespace before ':' linkcheck/url.py:277:5: E741 ambiguous variable name 'l' linkcheck/url.py:402:5: E741 ambiguous variable name 'l' linkcheck/checker/__init__.py:203:1: E402 module level import not at top of file linkcheck/checker/fileurl.py:200:89: E501 line too long (103 > 88 characters) linkcheck/checker/mailtourl.py:122:60: E203 whitespace before ':' linkcheck/checker/mailtourl.py:157:89: E501 line too long (96 > 88 characters) linkcheck/checker/mailtourl.py:190:89: E501 line too long (109 > 88 characters) linkcheck/checker/mailtourl.py:200:89: E501 line too long (111 > 88 characters) linkcheck/checker/mailtourl.py:249:89: E501 line too long (106 > 88 characters) linkcheck/checker/unknownurl.py:226:23: W291 trailing whitespace linkcheck/checker/urlbase.py:245:89: E501 line too long (101 > 88 characters) linkcheck/configuration/confparse.py:236:89: E501 line too long (186 > 88 characters) linkcheck/configuration/confparse.py:247:89: E501 line too long (111 > 88 characters) linkcheck/configuration/__init__.py:164:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:184:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:190:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:195:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:198:9: E266 too many leading '#' for block comment linkcheck/configuration/__init__.py:435:89: E501 line too long (90 > 88 characters) linkcheck/director/aggregator.py:45:43: E231 missing whitespace after ',' linkcheck/director/aggregator.py:178:89: E501 line too long (106 > 88 characters) linkcheck/logger/__init__.py:29:1: E731 do not assign a lambda expression, use a def linkcheck/logger/__init__.py:108:13: E741 ambiguous variable name 'l' linkcheck/logger/__init__.py:275:19: F821 undefined name '_' linkcheck/logger/__init__.py:342:16: F821 undefined name '_' linkcheck/logger/__init__.py:380:13: F821 undefined name '_' linkcheck/logger/__init__.py:384:13: F821 undefined name '_' linkcheck/logger/__init__.py:387:13: F821 undefined name '_' linkcheck/logger/__init__.py:396:13: F821 undefined name '_' linkcheck/network/__init__.py:1:1: W391 blank line at end of file linkcheck/plugins/locationinfo.py:89:9: E731 do not assign a lambda expression, use a def linkcheck/plugins/locationinfo.py:91:9: E731 do not assign a lambda expression, use a def linkcheck/plugins/markdowncheck.py:112:89: E501 line too long (111 > 88 characters) linkcheck/plugins/markdowncheck.py:141:9: E741 ambiguous variable name 'l' linkcheck/plugins/markdowncheck.py:165:23: E203 whitespace before ':' linkcheck/plugins/viruscheck.py:95:42: E203 whitespace before ':'	2020-05-30 17:01:36 +01:00
Chris Mayo	8dc2f12b94	Address space-separated strings in linkcheck/	2020-05-30 17:01:36 +01:00
Chris Mayo	b9f4864d9e	Remove unnecessary commas before closing brackets in linkcheck/	2020-05-30 17:01:36 +01:00
Chris Mayo	a92a684ac4	Run black on linkcheck/	2020-05-30 17:01:36 +01:00
Chris Mayo	abdb160413	Remove unused bookmarks code that supports linkcheck-gui linkchecker does not need to find a bookmark file, it is given the URL. Most bookmarks are detected by their MIME type, Firefox is different because it uses a SQLite database.	2020-05-28 19:44:53 +01:00
Chris Mayo	e204182acb	Remove unused httputil.has_header_value()	2020-05-28 19:44:53 +01:00
Chris Mayo	4d2449bb13	Merge pull request #425 from cjmayo/xdg_config_home Fix xdg_config_home import in bookmarks/chrome.py	2020-05-28 19:18:21 +01:00
Chris Mayo	75349e4dc9	Fix xdg_config_home import in bookmarks/chrome.py	2020-05-27 20:02:07 +01:00
Chris Mayo	a49f42b617	Remove unused mem.py	2020-05-27 20:01:57 +01:00
Chris Mayo	488e72c81f	Ignore imports providing aliases in subpackages	2020-05-26 19:49:59 +01:00
Chris Mayo	97f50e8be1	Remove unused import htmlsoup from checker/httpurl.py Unused since: `f7337f55` ("Fix error due to an empty html file accessed over http", 2020-05-23)	2020-05-25 19:50:57 +01:00
Chris Mayo	3473656fe1	Replace import of distutils.spawn.find_executable with shutil.which	2020-05-25 19:50:57 +01:00
Chris Mayo	6dda2f9669	Move imports to the top of files to resolve flake8 E402	2020-05-25 19:50:57 +01:00
Chris Mayo	0f3444e906	Drop run-time requests version check Requests 2.4.0 was released in 2014.	2020-05-25 19:50:57 +01:00
Chris Mayo	89c7c74bcf	Remove unused set_linecache() from better_exchook2.py	2020-05-25 19:50:57 +01:00
Chris Mayo	7257e5e1a0	Remove unused imports in parser/__init__.py	2020-05-25 19:50:57 +01:00
Chris Mayo	313a14ff0d	Remove instances of Python 2 unicode	2020-05-24 19:14:47 +01:00
Marius Gedminas	d0169c46d4	Merge pull request #348 from weshaggard/HandleRateLimiting Turn status code 429 into warning instead of failure	2020-05-24 16:16:56 +03:00
Marius Gedminas	dcafa2df75	Avoid u-prefixed strings linkchecker is Python 3 only, all strings are unicode.	2020-05-24 14:50:07 +03:00
Chris Mayo	03b1c4919d	Record encoding in debug log messages	2020-05-23 20:01:24 +01:00
Chris Mayo	f7337f55e8	Fix error due to an empty html file accessed over http Use the already fixed [1] UrlBase.get_content() in HttpUrl. [1] `5bd1fb4` ("Fix internal error on empty HTML files", 2020-05-21)	2020-05-23 20:01:24 +01:00
Marius Gedminas	f268a90cfb	Merge branch 'master' into HandleRateLimiting	2020-05-23 14:15:52 +03:00
Marius Gedminas	6dffacf17f	Merge pull request #409 from linkchecker/fix-login-timeouts Make sure login form fetching uses a timeout and sends User-Agent	2020-05-22 21:40:48 +03:00
Marius Gedminas	b0435b3d47	Make sure login form fetching uses a timeout Also resolve an XXX comment about the User-Agent header (which is configured in new_request_session), but add a couple of XXX comments about using proxy and possibly disabling TLS certificate checking.	2020-05-22 11:19:51 +03:00
Marius Gedminas	4f3fe5e1c3	Make sure fetching robots.txt uses the configured timeout Closes #396.	2020-05-22 10:53:33 +03:00
Marius Gedminas	c60d7c66e4	Clarify the decision to fall back to Latin-1	2020-05-21 19:35:39 +03:00
Marius Gedminas	5bd1fb4e36	Fix internal error on empty HTML files When BeautifulSoup finds an empty file on disk, it sets original_encoding to None. It doesn't matter what encoding we pick for empty files, so let's just pick one. I don't know if there are any circumstances where BeautifulSoup might set the encoding to None for a non-empty file. Closes #392.	2020-05-21 19:01:33 +03:00
Chris Mayo	6cfc8eeb49	Replace threading.Thread.setName() with setting the name property As recommended in: https://docs.python.org/3.5/library/threading.html#threading.Thread.setName	2020-05-20 19:58:44 +01:00
Chris Mayo	42eba19a7d	No need to encode url in Checker.check_url_data() Was causing b'' in log messages e.g. CheckThread-b'http:...	2020-05-20 19:58:44 +01:00
Chris Mayo	28f4587dfa	Remove str_text from fileutil.py, strformat.py and url.py	2020-05-19 19:56:42 +01:00
Chris Mayo	ebcc3c4961	Remove str_text from plugins/	2020-05-19 19:56:42 +01:00
Chris Mayo	1c14583535	Remove str_text from logger/	2020-05-19 19:56:42 +01:00
Chris Mayo	6bddd4ac60	Remove str_text from checker/	2020-05-19 19:56:42 +01:00
Chris Mayo	a127902607	Replace str_text in asserts	2020-05-19 19:56:42 +01:00
Chris Mayo	7490804e2c	Merge pull request #395 from cjmayo/tidyten11 Remove unused code from linkcheck/fileutil.py	2020-05-19 19:45:08 +01:00
Marius Gedminas	e6e969f975	Merge pull request #391 from linkchecker/dev-version Bump version in git to 10.0.0.dev0	2020-05-19 18:49:34 +03:00
Chris Mayo	690605c519	Remove unused code from linkcheck/fileutil.py	2020-05-18 19:29:55 +01:00
Marius Gedminas	5317347e54	Avoid distutils.version.StrictVersion distutils.version is old code that predates PEP 440. We could add a dependency on https://packaging.pypa.io/en/latest/version/, but meh.	2020-05-17 21:12:43 +03:00
Marius Gedminas	bb53aaa621	Fix viruscheck plugin The clamav interface needs bytes, not unicode. It would be nice if we had tests for this code.	2020-05-17 17:50:11 +01:00
Chris Mayo	a15a2833ca	Remove spaces after names in class method definitions And also nested functions. This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	1663e10fe7	Remove spaces after names in function definitions This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	fc11d08968	Remove spaces after names in class definitions	2020-05-16 20:19:42 +01:00
Chris Mayo	1416a08119	On Python 3 no need to convert os.linesep to a string	2020-05-16 17:02:01 +01:00
Chris Mayo	0752408a44	Remove Python 2 use of sys.stdout in i18n.get_encoded_writer()	2020-05-16 17:02:00 +01:00
Chris Mayo	2c2e7e55ac	Remove CSVLogger.encode_row_s() Introduced during Python 3 conversion to maintaint Python 2 support: `55a7973b` ("Python3: fix csvlog", 2016-12-04)	2020-05-16 17:02:00 +01:00
Chris Mayo	ed13a926d3	Remove setting Python 2 xmlparser.returns_unicode	2020-05-16 17:02:00 +01:00
Chris Mayo	025637b08d	Remove Python 2 cookielib import	2020-05-16 16:26:38 +01:00
Chris Mayo	1e277444f4	Remove Python 2 thread import	2020-05-16 16:26:34 +01:00
Chris Mayo	dcbddfe045	Remove Python 2 ConfigParser import	2020-05-15 19:37:04 +01:00
Chris Mayo	f8c9faec1b	Remove Python 2 cStringIO imports	2020-05-15 19:37:04 +01:00
Chris Mayo	bda9612273	Make html.escape Python 3 only	2020-05-14 20:15:28 +01:00
Chris Mayo	42de609f8e	Make urllib imports Python 3 only	2020-05-14 20:15:28 +01:00
Chris Mayo	3c661a83d0	Replace parse_host_port() in checker.proxysupport with url.splitport()	2020-05-14 20:15:28 +01:00
Chris Mayo	c80002437e	Update run-time version check	2020-05-13 19:50:19 +01:00
Chris Mayo	08ddf658bc	Merge pull request #366 from cjmayo/userorpwd Support login forms with user and/or password	2020-05-13 19:37:44 +01:00
Chris Mayo	736c893707	Merge pull request #377 from cjmayo/tidyten3 Remove u string prefixes	2020-05-13 19:36:54 +01:00
Chris Mayo	3ace021264	Support login forms with user and/or password	2020-05-13 19:32:25 +01:00
Chris Mayo	44e81d27dd	Remove inheriting object All Python 3 classes are new-style.	2020-05-08 10:45:31 +01:00
Chris Mayo	b0ea72e8c1	Remove # -*- coding: lines Except for tests that include non-unicode characters: tests/test_po.py tests/test_strformat.py tests/test_url.py tests/checker/test_error.py tests/checker/test_news.py	2020-05-08 10:45:31 +01:00
Marius Gedminas	22b0165b72	Make _Logger an abstract base class The __metaclass__ syntax is a Python-2-ism. It was replaced with class _Logger (object, metaclass=abc.ABCMeta): in Python 3. And then Python 3.4 introduced abc.ABC which is an empty class that has ABCMeta as the metaclass, making it simpler to define abstract base classes.	2020-04-30 23:09:42 +03:00
Chris Mayo	4d3e5abcfa	Remove u string prefixes	2020-04-30 20:11:59 +01:00
anarcat	ab476fa4bf	Merge pull request #364 from cjmayo/parser5 Stop using HTML handlers and improve login form error handling	2020-04-30 09:28:48 -04:00
Chris Mayo	12a948894b	Fix space style in linkcheck/htmlutil/linkparse.py	2020-04-29 20:07:00 +01:00
Chris Mayo	9eed070a73	Stop using HTML handlers LinkFinder is the only remaining HTML handler therefore no need for htmlsoup.process_soup() as an independent function or TagFinder as a base class.	2020-04-29 20:07:00 +01:00
Chris Mayo	4ffdbf2406	Replace MetaRobotsFinder using BeautifulSoup.find()	2020-04-29 20:07:00 +01:00
Chris Mayo	a51f02cf66	Improve error handling and debugging for login form	2020-04-27 18:06:29 +01:00
Chris Mayo	9a33c2a659	Make requesting login form password work on Python 3	2020-04-27 18:06:29 +01:00
Chris Mayo	8fc0dcc055	Make matching login form credentials case-sensitive The keys of the form.data dictionary are case-sensitive and therefore a KeyError was possible if the configured values are not identical to the input element name attributes.	2020-04-27 18:06:29 +01:00
Chris Mayo	7a6ef938cc	Rename htmlutil.formsearch to htmlutil.loginformsearch Make it clear that this module has only one specific use.	2020-04-27 18:06:29 +01:00
anarcat	350f8bfef9	Merge pull request #373 from linkchecker/fix-swf-parsing SWF files are binary data	2020-04-27 09:39:52 -04:00
Marius Gedminas	680783b1ff	SWF files are binary data Should fix #372.	2020-04-27 11:25:37 +03:00
anarcat	183d483074	Merge pull request #365 from cjmayo/tidyten1 Remove use of the future package	2020-04-26 12:02:30 -04:00
Chris Mayo	d189445a8e	LinkFinder does not raise StopParse	2020-04-18 20:30:46 +01:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
Chris Mayo	384e1e196d	Remove Python 2 gettext builtin installation	2020-04-15 19:49:16 +01:00
Chris Mayo	a83fbb56c0	Remove from __future__ imports	2020-04-15 19:49:16 +01:00
Chris Mayo	f5e7f3a382	Remove use of the future package It was providing Python 2 compatibility.	2020-04-15 19:49:16 +01:00
Chris Mayo	0795e3c1b4	Replace Parser class using BeautifulSoup.find_all()	2020-04-10 13:51:09 +01:00
Chris Mayo	eb3cf28baa	Remove support for start_end_element() callback The LinkFinder handler start_end_element() callback does nothing apart from call start_element().	2020-04-10 13:51:09 +01:00
Chris Mayo	c9f17e92b9	Remove support for end_element() callback	2020-04-10 13:51:09 +01:00
Chris Mayo	48b590cf8b	Replace FormFinder using BeautifulSoup.find_all() FormFinder was the only handler that used an end_element() callback and was therefore a blocker to moving the Parser class to use BeautifulSoup.find_all() FormFinder was a specialised handler used to parse a login form at the start of a session if the user had configured authentication credentials.	2020-04-10 13:51:05 +01:00
Chris Mayo	974915cc4f	Remove encoding from Parser Only used by the test and an attribute of the soup object.	2020-04-08 20:03:35 +01:00
Chris Mayo	02e1c389b2	Remove parser flush() and reset() Remnants of the feed() interface.	2020-04-08 20:03:35 +01:00
Chris Mayo	3771dd9136	Use parser.feed_soup() instead of parser.feed() Markup is not being passed in pieces to the parser, so simplify the interface and reduce the state further.	2020-04-08 20:03:35 +01:00
Chris Mayo	40f43ae41c	Create one function to make soup objects	2020-04-08 20:03:35 +01:00
Chris Mayo	9d8d251d06	Replace Parser lineno() and column() methods Stop storing this data in Parser object state.	2020-04-08 20:03:35 +01:00
Chris Mayo	16e6fb2919	Fix incorrect character in FormFinder log message	2020-04-07 19:24:34 +01:00
Chris Mayo	00f940d979	Fix FormFinder callbacks for missing element_text element_text added in: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-04-07 19:24:34 +01:00
Chris Mayo	fe024fb0c8	Remove unused Parser.debug() method	2020-04-03 19:24:08 +01:00
Chris Mayo	0c5e3bb403	Remove old HtmlParser .gitignore htmlparse.output was a product of the built-in parser.	2020-04-03 19:24:08 +01:00
Chris Mayo	036b900ffc	Remove unused linkcheck.containers classes	2020-04-03 19:24:08 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	a7e1e20172	Remove last line and column from Parser Only used for debug log message and not very useful.	2020-04-03 19:24:08 +01:00
Chris Mayo	28701e291a	Remove use of Python 2 unicode() and related u prefixes Several instances for MS Windows left unchanged.	2020-04-01 19:39:50 +01:00
anarcat	cf4e6bb235	Merge pull request #351 from cjmayo/tagsonly Remove support for non-Tag elements from Parser	2020-04-01 12:17:18 -04:00
Chris Mayo	ffa6ac457f	Remove support for non-Tag elements from Parser This change is made because the linkchecker handlers only process Tags. The test HtmlPrettyPrinter handler is updated to output element text because its support for non-Tag elements has been removed. This results in a number of the existing tests still passing.	2020-03-31 20:10:35 +01:00
Chris Mayo	e7c5f353cd	Remove unused function linkcheck.fileutil.write_file() Doesn't appear to have ever been used. Causes flake8 error: linkcheck/fileutil.py:45:9: F821 undefined name 'file'	2020-03-31 19:46:31 +01:00
Chris Mayo	504004d4f0	Use ipaddress in network.iputil.is_valid_ip() ipaddress was introduced in Python 3.3.	2020-03-31 19:46:31 +01:00
Chris Mayo	2eb1424703	Replace deprecated plistlib.readPlistFromBytes() in bookmarks.safari Remove Python 2 code. plistlib.loads() was added in Python 3.4.	2020-03-31 19:46:31 +01:00
Chris Mayo	0ee4414a60	Replace memoized with functools.lru_cache	2020-03-31 19:46:31 +01:00
Chris Mayo	1255119ca8	Move HtmlPrinter and HtmlPrettyPrinter into tests	2020-03-30 19:32:30 +01:00
Chris Mayo	ce1d669329	Remove unused functions from linkcheck.httputil http_persistent() unused since: `4b818cb4` ("Detect more cases to close the connection, and close response objects", 2006-09-15) http_keepalive(), get_content_encoding() unused since: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2020-03-30 19:32:30 +01:00
Chris Mayo	5b66964afa	Remove unused .charset from checker classes Unused since: `4f8c2954` ("Don't set parser.encoding", 2019-10-05)	2020-03-30 19:32:30 +01:00
Chris Mayo	f743be57e8	Remove unused functions from linkcheck.HtmlParser resolve_entities() unused since: `2c000683` ("Remove unused linkcheck.htmlutil.linkname module", 2020-03-30) set_doctype(), set_encoding() unused since: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-03-30 19:32:18 +01:00
Chris Mayo	2c000683e1	Remove unused linkcheck.htmlutil.linkname module Unused since: `d6d48b48` ("html parser: use name instead of peeking", 2019-07-22)	2020-03-30 19:31:11 +01:00
Marius Gedminas	af0f50efa8	Restore support for older BeautifulSoup4 versions	2020-03-30 14:49:56 +03:00
Wes Haggard	dcdc64e878	Turn status code 429 into warning instead of failure	2020-03-25 16:36:08 -07:00
Marius Gedminas	a311ebb97e	Fix doctype tests I don't think linkchecker actually cares about the document type, so I'm not sure why we're even testing this...	2020-03-23 10:56:57 +02:00
Chris Mayo	5eaad24641	Use HTTP header encoding for decoding	2020-03-22 19:54:37 +00:00
Chris Mayo	f5ae90e824	Parser threading lock no longer required with Beautiful Soup	2020-03-22 19:54:37 +00:00
Chris Mayo	d3d6638973	Actually fix TypeError when checking https link The test was added but not the fix in: `ecd06776` ("Fix TypeError when checking https link and test", 2019-11-11) Which is caught by the new test when run on Python 3: ___________________ TestHttps.test_x509_to_dict__________________ [gw14] linux -- Python 3.6.9 /usr/bin/python3.6 tests/checker/test_https.py:72: in test_x509_to_dict self.assertEqual(httputil.x509_to_dict(cert)["notAfter"], linkcheck/httputil.py:47: in x509_to_dict parsedtime = asn1_generaltime_to_seconds(notAfter) linkcheck/httputil.py:68: in asn1_generaltime_to_seconds res = datetime.strptime(timestr, timeformat + 'Z') E TypeError: strptime() argument 1 must be str, not bytes	2019-11-19 20:06:10 +00:00
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Chris Mayo	b2e63663f8	Make PdfParser Python 3 compatible basestring is not available in Python 3. Ensure all URLs are Unicode. url_data.get_raw_content() is returning bytes.	2019-10-24 19:57:27 +01:00
Marius Gedminas	a1af1e9717	Fix sitemap parser PyExpat wants bytes on Python 2. See #323.	2019-10-23 17:23:23 +03:00
Marius Gedminas	938467c3ae	Merge pull request #324 from cjmayo/pdfminer Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-23 09:47:01 +03:00
Marius Gedminas	db3e25e934	Merge pull request #326 from linkchecker/fix-word-maybe Fix MS Word parser, hopefully	2019-10-22 18:08:46 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	fa32a89d6b	Fix MS Word parser, hopefully MS Word files are binary data, and get_temp_filename() will write them to disk using open(..., 'wb'), so we want to pass bytes in there, not Unicode. See #323.	2019-10-22 16:39:57 +03:00
Marius Gedminas	58b0d5aaae	Fix TypeError: string arg required in content_allows_robots() See #323 an #317.	2019-10-22 14:13:45 +03:00
Chris Mayo	949f84d329	PdfParser requires bytes	2019-10-21 20:12:33 +01:00
Chris Mayo	7da64b16f0	Don't add linkcheck_dns directory to sys.path This code was added in: `efbbb656` ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07) Installation of linkcheck_dns stopped with: `0a13fae3` ("remove third party packages and use them as dependency", 2018-01-06)	2019-10-21 19:52:58 +01:00
Marius Gedminas	e274d74be2	Wait for threads to exit after stopping them This fixes a race condition where the main thread would check if any internal errors happened and get back a 0 while a worker thread was still busy printing the internal error message before incrementing the counter. Fixes #320. My experiments show that this adds no perceptible delay to the script runtime (on Linux). More specifically, there already is an annoying perceptible delay of about 1 second, but it's not caused by this change.	2019-10-21 18:23:58 +03:00
Marius Gedminas	84dbb5d603	Fix TypeError: string arg required in find_links() Fixes #317.	2019-10-21 17:47:46 +03:00
Chris Mayo	c7a32d67fe	Remove unused code from network subpackage	2019-10-19 10:27:34 +01:00
anarcat	f73ba54a2a	Merge pull request #308 from cjmayo/decode Decode content when retrieved	2019-10-10 09:46:32 -04:00
anarcat	7cfb1136e9	Merge pull request #313 from cjmayo/titlefinder Remove unused linkparse.TitleFinder	2019-10-07 11:30:10 -04:00
Chris Mayo	127c2272c4	Remove unused linkparse.TitleFinder Stopped being used with removal of UrlBase.set_title_from_content() in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2019-10-05 19:43:33 +01:00
Chris Mayo	b7ec71d8cc	Always use utf-8 encoding when quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	a9f147c347	Update fileutil.pathencode() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	5bb4524a63	Update strformat.ascii_safe() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	153e53ba03	Reuse soup object used for detecting encoding in the HTML parser	2019-10-05 19:38:57 +01:00
Chris Mayo	978042a54e	Hide Beautiful Soup soupsieve warning Shown every time linkchecker is run: /usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used. 'The soupsieve package is not installed. CSS selectors cannot be used.'	2019-10-05 19:38:57 +01:00
Chris Mayo	30df69c158	Improve pretty printed comments	2019-10-05 19:38:57 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Chris Mayo	4f8c2954cf	Don't set parser.encoding Read-only property with new Beautiful Soup parser.	2019-10-05 19:38:57 +01:00
Chris Mayo	5732606c58	Remove urlutil.decode_for_unquote() Not needed since all content is now being decoded on retrieval. Added by: `a6643034` ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)	2019-10-04 19:37:09 +01:00
Chris Mayo	2776eb5f52	Revert "Python3: fix opening file URLs" This reverts commit `4c9ec511b5`.	2019-10-04 19:37:09 +01:00
Chris Mayo	c6a06d99ac	Remove unnecessary unicode() from StatusLogger.writeln()	2019-09-30 20:06:48 +01:00
Petr Dlouhý	6e8da10942	fixes for Python 3: fix markdowncheck The translate() method of string objects (and Python 2 Unicode objects) only accepts a single, table argument.	2019-09-30 19:46:24 +01:00
Chris Mayo	e01ea0d9f0	Safari bookmark parser requires bytes	2019-09-30 19:46:24 +01:00
Chris Mayo	ad33d359c1	Adapt Opera bookmark parser to work with decoded data	2019-09-30 19:46:24 +01:00
Chris Mayo	9460064084	Use requests to decode the content of login form	2019-09-30 19:46:24 +01:00
Chris Mayo	5fc01455b7	Decode content when retrieved, use bs4 to detect encoding if non-Unicode UrlBase has been modified as follows: - the "data" variable now holds bytes - decoded content is stored in a new variable "text" - functionality from get_content() has been split out into get_raw_content() which returns "data" and download_content() which calls read_content() and sets the download related variables. This allows for subclasses to do their own decoding and parsers to use bytes.	2019-09-30 19:46:24 +01:00
Chris Mayo	0c90c718bf	Revert "Python3: fix bytes mark in parser/__init__.py" This reverts commit `aec8243348`.	2019-09-30 19:46:24 +01:00
Chris Mayo	53cd9475b5	Replace deprecated cgi.escape html provided for Python 2 by future https://python-future.org/compatible_idioms.html#html-escaping-and-entities	2019-09-17 20:25:05 +01:00
anarcat	1590408a65	Merge pull request #306 from cjmayo/python3_49 {python3_49} enable and fix remaining bookmark tests	2019-09-16 15:18:26 -04:00
Petr Dlouhý	eaa7131523	enable and fix remaining bookmark tests biplist module preferred for reading Safari bookmarks in bookmarks/safari.py so install it for tox testing.	2019-09-16 20:08:01 +01:00
anarcat	4ccf0fb2d0	Merge pull request #305 from cjmayo/python3_48 {python3_48} Python3: fix displaying help	2019-09-16 10:10:36 -04:00
anarcat	2c7573b3b8	Merge pull request #300 from cjmayo/python3_43 {python3_43} Python3: fix for test_telnet in urlbase.py	2019-09-16 10:08:18 -04:00
anarcat	bec68f237b	Merge pull request #299 from cjmayo/python3_42 {python3_42} fixes for Python 3: fix telneturl	2019-09-16 10:07:55 -04:00
anarcat	27d672c78b	Merge pull request #297 from cjmayo/python3_40 {python3_40} Python3: fixes form checker/__init__.py	2019-09-16 10:06:05 -04:00
anarcat	5a0a02ae74	Merge pull request #294 from cjmayo/python3_39_alt {python3_39_alt} Python3: fix TypeError in HttpUrl.read_content()	2019-09-16 10:04:23 -04:00
Petr Dlouhý	14e19efe07	Python3: fix displaying help	2019-09-15 19:50:05 +01:00
Petr Dlouhý	c2af88ad2e	Python3: fix for test_telnet in urlbase.py	2019-09-15 19:49:26 +01:00
Petr Dlouhý	a2e67af7b4	fixes for Python 3: fix telneturl	2019-09-15 19:49:18 +01:00
Petr Dlouhý	bb542b00e9	Python3: fixes form checker/__init__.py	2019-09-15 19:49:00 +01:00
Chris Mayo	06fdd78f91	Python3: fix TypeError in HttpUrl.read_content() From test_http_redirect: File "linkchecker/linkcheck/checker/httpurl.py", line 323, in read_content line: buf.write(data) locals: buf = <local> <_io.StringIO object at 0x7f8fe2f45e10> buf.write = <local> <built-in method write of _io.StringIO object at 0x7f8fe2f45e10> data = <local> b'<a href="newurl.html">Recursive Redirect</a>\n' TypeError: string argument expected, got 'bytes'	2019-09-15 19:42:29 +01:00
anarcat	736d2a786d	Merge pull request #293 from cjmayo/python3_37_alt {python3_37_alt} Python3: fix TypeError when parsing cookie data	2019-09-14 11:51:26 -04:00
anarcat	fe39db4fbf	Merge pull request #287 from cjmayo/python3_36 {python3_36} fixes for Python 3 + Travis test: fix cgi	2019-09-14 11:50:53 -04:00
Chris Mayo	a7b7e31917	Python3: fix TypeError when parsing cookie data > fp = BytesIO(strheader) E TypeError: a bytes-like object is required, not 'str' linkcheck/cookies.py:61: TypeError The email package provides the message_from_string() convenience function which avoids the need to create a file-like object. Indeed http.client.HTTPMessage is implemented using email.message.Message.	2019-09-13 20:10:25 +01:00
Petr Dlouhý	36465112d0	fixes for Python 3 + Travis test: fix cgi	2019-09-13 19:46:13 +01:00
anarcat	aaa8cb675e	Merge pull request #291 from cjmayo/python3_33_alt {python3_33_alt} Python3: fix opening file URLs	2019-09-13 10:31:20 -04:00
anarcat	80b62a3e21	Merge pull request #292 from cjmayo/lc_cgi_error Fix errors caused by logging LCFormError exceptions	2019-09-13 09:12:05 -04:00
anarcat	b0b392f7cc	Merge pull request #282 from cjmayo/python3_31 {python3_31} Python3: fix strformat strline()	2019-09-13 09:11:33 -04:00
Chris Mayo	6dc25547d5	Fix errors caused by logging LCFormError exceptions	2019-09-12 20:13:08 +01:00

... 2 3 4 5 6 ...

3273 commits