linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-13 17:13:11 +00:00

Author	SHA1	Message	Date
Chris Mayo	036b900ffc	Remove unused linkcheck.containers classes	2020-04-03 19:24:08 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	a7e1e20172	Remove last line and column from Parser Only used for debug log message and not very useful.	2020-04-03 19:24:08 +01:00
Chris Mayo	28701e291a	Remove use of Python 2 unicode() and related u prefixes Several instances for MS Windows left unchanged.	2020-04-01 19:39:50 +01:00
anarcat	cf4e6bb235	Merge pull request #351 from cjmayo/tagsonly Remove support for non-Tag elements from Parser	2020-04-01 12:17:18 -04:00
Chris Mayo	ffa6ac457f	Remove support for non-Tag elements from Parser This change is made because the linkchecker handlers only process Tags. The test HtmlPrettyPrinter handler is updated to output element text because its support for non-Tag elements has been removed. This results in a number of the existing tests still passing.	2020-03-31 20:10:35 +01:00
Chris Mayo	e7c5f353cd	Remove unused function linkcheck.fileutil.write_file() Doesn't appear to have ever been used. Causes flake8 error: linkcheck/fileutil.py:45:9: F821 undefined name 'file'	2020-03-31 19:46:31 +01:00
Chris Mayo	504004d4f0	Use ipaddress in network.iputil.is_valid_ip() ipaddress was introduced in Python 3.3.	2020-03-31 19:46:31 +01:00
Chris Mayo	2eb1424703	Replace deprecated plistlib.readPlistFromBytes() in bookmarks.safari Remove Python 2 code. plistlib.loads() was added in Python 3.4.	2020-03-31 19:46:31 +01:00
Chris Mayo	0ee4414a60	Replace memoized with functools.lru_cache	2020-03-31 19:46:31 +01:00
Chris Mayo	1255119ca8	Move HtmlPrinter and HtmlPrettyPrinter into tests	2020-03-30 19:32:30 +01:00
Chris Mayo	ce1d669329	Remove unused functions from linkcheck.httputil http_persistent() unused since: `4b818cb4` ("Detect more cases to close the connection, and close response objects", 2006-09-15) http_keepalive(), get_content_encoding() unused since: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2020-03-30 19:32:30 +01:00
Chris Mayo	5b66964afa	Remove unused .charset from checker classes Unused since: `4f8c2954` ("Don't set parser.encoding", 2019-10-05)	2020-03-30 19:32:30 +01:00
Chris Mayo	f743be57e8	Remove unused functions from linkcheck.HtmlParser resolve_entities() unused since: `2c000683` ("Remove unused linkcheck.htmlutil.linkname module", 2020-03-30) set_doctype(), set_encoding() unused since: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-03-30 19:32:18 +01:00
Chris Mayo	2c000683e1	Remove unused linkcheck.htmlutil.linkname module Unused since: `d6d48b48` ("html parser: use name instead of peeking", 2019-07-22)	2020-03-30 19:31:11 +01:00
Marius Gedminas	af0f50efa8	Restore support for older BeautifulSoup4 versions	2020-03-30 14:49:56 +03:00
Marius Gedminas	a311ebb97e	Fix doctype tests I don't think linkchecker actually cares about the document type, so I'm not sure why we're even testing this...	2020-03-23 10:56:57 +02:00
Chris Mayo	5eaad24641	Use HTTP header encoding for decoding	2020-03-22 19:54:37 +00:00
Chris Mayo	f5ae90e824	Parser threading lock no longer required with Beautiful Soup	2020-03-22 19:54:37 +00:00
Chris Mayo	b7ec71d8cc	Always use utf-8 encoding when quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	a9f147c347	Update fileutil.pathencode() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	5bb4524a63	Update strformat.ascii_safe() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	153e53ba03	Reuse soup object used for detecting encoding in the HTML parser	2019-10-05 19:38:57 +01:00
Chris Mayo	978042a54e	Hide Beautiful Soup soupsieve warning Shown every time linkchecker is run: /usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used. 'The soupsieve package is not installed. CSS selectors cannot be used.'	2019-10-05 19:38:57 +01:00
Chris Mayo	30df69c158	Improve pretty printed comments	2019-10-05 19:38:57 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Chris Mayo	4f8c2954cf	Don't set parser.encoding Read-only property with new Beautiful Soup parser.	2019-10-05 19:38:57 +01:00
Petr Dlouhý	b5111453d8	change test_parse encoding to UTF-8	2019-07-22 19:59:37 +01:00
Petr Dlouhý	d6d48b4814	html parser: use name instead of peeking	2019-07-22 19:59:37 +01:00
Petr Dlouhý	51a06d8a1e	Remove home-cooked htmlparser and use BeautifulSoup	2019-07-22 19:59:37 +01:00
Petr Dlouhý	2daf685633	Python3: fix few htmllib problems	2018-01-05 22:48:46 +01:00
Chris Mayo	d3d6638973	Actually fix TypeError when checking https link The test was added but not the fix in: `ecd06776` ("Fix TypeError when checking https link and test", 2019-11-11) Which is caught by the new test when run on Python 3: ___________________ TestHttps.test_x509_to_dict__________________ [gw14] linux -- Python 3.6.9 /usr/bin/python3.6 tests/checker/test_https.py:72: in test_x509_to_dict self.assertEqual(httputil.x509_to_dict(cert)["notAfter"], linkcheck/httputil.py:47: in x509_to_dict parsedtime = asn1_generaltime_to_seconds(notAfter) linkcheck/httputil.py:68: in asn1_generaltime_to_seconds res = datetime.strptime(timestr, timeformat + 'Z') E TypeError: strptime() argument 1 must be str, not bytes	2019-11-19 20:06:10 +00:00
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Chris Mayo	b2e63663f8	Make PdfParser Python 3 compatible basestring is not available in Python 3. Ensure all URLs are Unicode. url_data.get_raw_content() is returning bytes.	2019-10-24 19:57:27 +01:00
Marius Gedminas	a1af1e9717	Fix sitemap parser PyExpat wants bytes on Python 2. See #323.	2019-10-23 17:23:23 +03:00
Marius Gedminas	938467c3ae	Merge pull request #324 from cjmayo/pdfminer Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-23 09:47:01 +03:00
Marius Gedminas	db3e25e934	Merge pull request #326 from linkchecker/fix-word-maybe Fix MS Word parser, hopefully	2019-10-22 18:08:46 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	fa32a89d6b	Fix MS Word parser, hopefully MS Word files are binary data, and get_temp_filename() will write them to disk using open(..., 'wb'), so we want to pass bytes in there, not Unicode. See #323.	2019-10-22 16:39:57 +03:00
Marius Gedminas	58b0d5aaae	Fix TypeError: string arg required in content_allows_robots() See #323 an #317.	2019-10-22 14:13:45 +03:00
Chris Mayo	949f84d329	PdfParser requires bytes	2019-10-21 20:12:33 +01:00
Chris Mayo	7da64b16f0	Don't add linkcheck_dns directory to sys.path This code was added in: `efbbb656` ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07) Installation of linkcheck_dns stopped with: `0a13fae3` ("remove third party packages and use them as dependency", 2018-01-06)	2019-10-21 19:52:58 +01:00
Marius Gedminas	e274d74be2	Wait for threads to exit after stopping them This fixes a race condition where the main thread would check if any internal errors happened and get back a 0 while a worker thread was still busy printing the internal error message before incrementing the counter. Fixes #320. My experiments show that this adds no perceptible delay to the script runtime (on Linux). More specifically, there already is an annoying perceptible delay of about 1 second, but it's not caused by this change.	2019-10-21 18:23:58 +03:00
Marius Gedminas	84dbb5d603	Fix TypeError: string arg required in find_links() Fixes #317.	2019-10-21 17:47:46 +03:00
Chris Mayo	c7a32d67fe	Remove unused code from network subpackage	2019-10-19 10:27:34 +01:00
anarcat	f73ba54a2a	Merge pull request #308 from cjmayo/decode Decode content when retrieved	2019-10-10 09:46:32 -04:00
anarcat	7cfb1136e9	Merge pull request #313 from cjmayo/titlefinder Remove unused linkparse.TitleFinder	2019-10-07 11:30:10 -04:00
Chris Mayo	127c2272c4	Remove unused linkparse.TitleFinder Stopped being used with removal of UrlBase.set_title_from_content() in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2019-10-05 19:43:33 +01:00
Chris Mayo	5732606c58	Remove urlutil.decode_for_unquote() Not needed since all content is now being decoded on retrieval. Added by: `a6643034` ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)	2019-10-04 19:37:09 +01:00
Chris Mayo	2776eb5f52	Revert "Python3: fix opening file URLs" This reverts commit `4c9ec511b5`.	2019-10-04 19:37:09 +01:00
Chris Mayo	c6a06d99ac	Remove unnecessary unicode() from StatusLogger.writeln()	2019-09-30 20:06:48 +01:00
Petr Dlouhý	6e8da10942	fixes for Python 3: fix markdowncheck The translate() method of string objects (and Python 2 Unicode objects) only accepts a single, table argument.	2019-09-30 19:46:24 +01:00
Chris Mayo	e01ea0d9f0	Safari bookmark parser requires bytes	2019-09-30 19:46:24 +01:00
Chris Mayo	ad33d359c1	Adapt Opera bookmark parser to work with decoded data	2019-09-30 19:46:24 +01:00
Chris Mayo	9460064084	Use requests to decode the content of login form	2019-09-30 19:46:24 +01:00
Chris Mayo	5fc01455b7	Decode content when retrieved, use bs4 to detect encoding if non-Unicode UrlBase has been modified as follows: - the "data" variable now holds bytes - decoded content is stored in a new variable "text" - functionality from get_content() has been split out into get_raw_content() which returns "data" and download_content() which calls read_content() and sets the download related variables. This allows for subclasses to do their own decoding and parsers to use bytes.	2019-09-30 19:46:24 +01:00
Chris Mayo	0c90c718bf	Revert "Python3: fix bytes mark in parser/__init__.py" This reverts commit `aec8243348`.	2019-09-30 19:46:24 +01:00
Chris Mayo	53cd9475b5	Replace deprecated cgi.escape html provided for Python 2 by future https://python-future.org/compatible_idioms.html#html-escaping-and-entities	2019-09-17 20:25:05 +01:00
anarcat	1590408a65	Merge pull request #306 from cjmayo/python3_49 {python3_49} enable and fix remaining bookmark tests	2019-09-16 15:18:26 -04:00
Petr Dlouhý	eaa7131523	enable and fix remaining bookmark tests biplist module preferred for reading Safari bookmarks in bookmarks/safari.py so install it for tox testing.	2019-09-16 20:08:01 +01:00
anarcat	4ccf0fb2d0	Merge pull request #305 from cjmayo/python3_48 {python3_48} Python3: fix displaying help	2019-09-16 10:10:36 -04:00
anarcat	2c7573b3b8	Merge pull request #300 from cjmayo/python3_43 {python3_43} Python3: fix for test_telnet in urlbase.py	2019-09-16 10:08:18 -04:00
anarcat	bec68f237b	Merge pull request #299 from cjmayo/python3_42 {python3_42} fixes for Python 3: fix telneturl	2019-09-16 10:07:55 -04:00
anarcat	27d672c78b	Merge pull request #297 from cjmayo/python3_40 {python3_40} Python3: fixes form checker/__init__.py	2019-09-16 10:06:05 -04:00
anarcat	5a0a02ae74	Merge pull request #294 from cjmayo/python3_39_alt {python3_39_alt} Python3: fix TypeError in HttpUrl.read_content()	2019-09-16 10:04:23 -04:00
Petr Dlouhý	14e19efe07	Python3: fix displaying help	2019-09-15 19:50:05 +01:00
Petr Dlouhý	c2af88ad2e	Python3: fix for test_telnet in urlbase.py	2019-09-15 19:49:26 +01:00
Petr Dlouhý	a2e67af7b4	fixes for Python 3: fix telneturl	2019-09-15 19:49:18 +01:00
Petr Dlouhý	bb542b00e9	Python3: fixes form checker/__init__.py	2019-09-15 19:49:00 +01:00
Chris Mayo	06fdd78f91	Python3: fix TypeError in HttpUrl.read_content() From test_http_redirect: File "linkchecker/linkcheck/checker/httpurl.py", line 323, in read_content line: buf.write(data) locals: buf = <local> <_io.StringIO object at 0x7f8fe2f45e10> buf.write = <local> <built-in method write of _io.StringIO object at 0x7f8fe2f45e10> data = <local> b'<a href="newurl.html">Recursive Redirect</a>\n' TypeError: string argument expected, got 'bytes'	2019-09-15 19:42:29 +01:00
anarcat	736d2a786d	Merge pull request #293 from cjmayo/python3_37_alt {python3_37_alt} Python3: fix TypeError when parsing cookie data	2019-09-14 11:51:26 -04:00
anarcat	fe39db4fbf	Merge pull request #287 from cjmayo/python3_36 {python3_36} fixes for Python 3 + Travis test: fix cgi	2019-09-14 11:50:53 -04:00
Chris Mayo	a7b7e31917	Python3: fix TypeError when parsing cookie data > fp = BytesIO(strheader) E TypeError: a bytes-like object is required, not 'str' linkcheck/cookies.py:61: TypeError The email package provides the message_from_string() convenience function which avoids the need to create a file-like object. Indeed http.client.HTTPMessage is implemented using email.message.Message.	2019-09-13 20:10:25 +01:00
Petr Dlouhý	36465112d0	fixes for Python 3 + Travis test: fix cgi	2019-09-13 19:46:13 +01:00
anarcat	aaa8cb675e	Merge pull request #291 from cjmayo/python3_33_alt {python3_33_alt} Python3: fix opening file URLs	2019-09-13 10:31:20 -04:00
anarcat	80b62a3e21	Merge pull request #292 from cjmayo/lc_cgi_error Fix errors caused by logging LCFormError exceptions	2019-09-13 09:12:05 -04:00
anarcat	b0b392f7cc	Merge pull request #282 from cjmayo/python3_31 {python3_31} Python3: fix strformat strline()	2019-09-13 09:11:33 -04:00
Chris Mayo	6dc25547d5	Fix errors caused by logging LCFormError exceptions	2019-09-12 20:13:08 +01:00
Chris Mayo	4c9ec511b5	Python3: fix opening file URLs urllib.request.urlopen() expects a string or Request object.	2019-09-12 19:58:27 +01:00
anarcat	eb2e3271a2	Merge pull request #279 from cjmayo/python3_28 {python3_28} Python3: fix robotparser	2019-09-12 08:40:18 -04:00
anarcat	8c072fa757	Merge pull request #289 from cjmayo/python3_38 {python3_38} Python3: fix linkname.py	2019-09-12 08:39:29 -04:00
Petr Dlouhý	538c4cfeb9	Python3: fix linkname.py	2019-09-11 20:32:33 +01:00
Petr Dlouhý	8a294be95f	Python3: fix robotparser	2019-09-11 20:04:26 +01:00
anarcat	44944754d5	Merge pull request #286 from cjmayo/python3_35 {python3_35} Python3: fix unichr() in htmlparser	2019-09-11 09:48:35 -04:00
anarcat	2239458966	Merge pull request #285 from cjmayo/python3_34 {python3_34} fixes for Python 3: fix test_misc	2019-09-11 09:48:14 -04:00
anarcat	dbbb64cd90	Merge pull request #283 from cjmayo/python3_32 {python3_32} fixes for Python 3 + Travis test: fix threads	2019-09-11 09:47:44 -04:00
anarcat	492058a360	Merge pull request #281 from cjmayo/python3_30 {python3_30} Python3: fix decoding strings	2019-09-11 09:47:10 -04:00
anarcat	8eadc5f8a1	Merge pull request #280 from cjmayo/python3_29 {python3_29} fixes for Python 3: fix running problems in Python 3	2019-09-11 09:46:48 -04:00
Petr Dlouhý	f272206110	Python3: fix decoding strings	2019-09-10 19:52:23 +01:00
Petr Dlouhý	55a7973b93	Python3: fix csvlog	2019-09-10 19:42:26 +01:00
Petr Dlouhý	e10f25b968	fixes for Python 3: fix running problems in Python 3	2019-09-10 19:30:09 +01:00
Petr Dlouhý	d20ac0e108	Python3: fix strformat strline()	2019-09-09 19:51:30 +01:00
Petr Dlouhý	8b9f29ae52	Python3: fix unichr() in htmlparser	2019-09-09 19:51:30 +01:00
Petr Dlouhý	129a68da38	fixes for Python 3: fix test_misc	2019-09-09 19:51:30 +01:00
Petr Dlouhý	57f7ba0979	fixes for Python 3 + Travis test: fix threads	2019-09-09 19:51:30 +01:00
Marius Gedminas	60f9f80b9f	Fix test_console.py on Python 3 This is a alternative fix I suggested in the comments on PR #273.	2019-09-09 18:52:29 +03:00

1 2 3 4 5 ...

3051 commits