linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-04-15 11:51:02 +00:00

Author	SHA1	Message	Date
Chris Mayo	4ffdbf2406	Replace MetaRobotsFinder using BeautifulSoup.find()	2020-04-29 20:07:00 +01:00
anarcat	183d483074	Merge pull request #365 from cjmayo/tidyten1 Remove use of the future package	2020-04-26 12:02:30 -04:00
Chris Mayo	56b8c9f7ab	Add tests for <meta name="robots" content="nofollow"> norobots.html was used for testing <meta name="robots" content="nofollow"> in local files until [1]. This commit reinstates local file testing and adds an http test. Checking is reported by checker.httpurl.HttpUrl.content_allows_robots(). [1] `ce733ae7` ("Don't check for robots.txt directives in local html files.", 2014-03-19)	2020-04-18 20:30:46 +01:00
Chris Mayo	d189445a8e	LinkFinder does not raise StopParse	2020-04-18 20:30:46 +01:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
Chris Mayo	a83fbb56c0	Remove from __future__ imports	2020-04-15 19:49:16 +01:00
Chris Mayo	f5e7f3a382	Remove use of the future package It was providing Python 2 compatibility.	2020-04-15 19:49:16 +01:00
Chris Mayo	0795e3c1b4	Replace Parser class using BeautifulSoup.find_all()	2020-04-10 13:51:09 +01:00
Chris Mayo	eb3cf28baa	Remove support for start_end_element() callback The LinkFinder handler start_end_element() callback does nothing apart from call start_element().	2020-04-10 13:51:09 +01:00
Chris Mayo	c9f17e92b9	Remove support for end_element() callback	2020-04-10 13:51:09 +01:00
Chris Mayo	48b590cf8b	Replace FormFinder using BeautifulSoup.find_all() FormFinder was the only handler that used an end_element() callback and was therefore a blocker to moving the Parser class to use BeautifulSoup.find_all() FormFinder was a specialised handler used to parse a login form at the start of a session if the user had configured authentication credentials.	2020-04-10 13:51:05 +01:00
Chris Mayo	974915cc4f	Remove encoding from Parser Only used by the test and an attribute of the soup object.	2020-04-08 20:03:35 +01:00
Chris Mayo	02e1c389b2	Remove parser flush() and reset() Remnants of the feed() interface.	2020-04-08 20:03:35 +01:00
Chris Mayo	3771dd9136	Use parser.feed_soup() instead of parser.feed() Markup is not being passed in pieces to the parser, so simplify the interface and reduce the state further.	2020-04-08 20:03:35 +01:00
Chris Mayo	9d8d251d06	Replace Parser lineno() and column() methods Stop storing this data in Parser object state.	2020-04-08 20:03:35 +01:00
Chris Mayo	514210199d	Add tests for search_form	2020-04-07 19:24:34 +01:00
Chris Mayo	036b900ffc	Remove unused linkcheck.containers classes	2020-04-03 19:24:08 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	28701e291a	Remove use of Python 2 unicode() and related u prefixes Several instances for MS Windows left unchanged.	2020-04-01 19:39:50 +01:00
anarcat	cf4e6bb235	Merge pull request #351 from cjmayo/tagsonly Remove support for non-Tag elements from Parser	2020-04-01 12:17:18 -04:00
Chris Mayo	9fc651e82b	Remove Python 2 compatibility from parser tests	2020-03-31 20:10:35 +01:00
Chris Mayo	ffa6ac457f	Remove support for non-Tag elements from Parser This change is made because the linkchecker handlers only process Tags. The test HtmlPrettyPrinter handler is updated to output element text because its support for non-Tag elements has been removed. This results in a number of the existing tests still passing.	2020-03-31 20:10:35 +01:00
Chris Mayo	0ee4414a60	Replace memoized with functools.lru_cache	2020-03-31 19:46:31 +01:00
Chris Mayo	1255119ca8	Move HtmlPrinter and HtmlPrettyPrinter into tests	2020-03-30 19:32:30 +01:00
Chris Mayo	f743be57e8	Remove unused functions from linkcheck.HtmlParser resolve_entities() unused since: `2c000683` ("Remove unused linkcheck.htmlutil.linkname module", 2020-03-30) set_doctype(), set_encoding() unused since: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-03-30 19:32:18 +01:00
Chris Mayo	2c000683e1	Remove unused linkcheck.htmlutil.linkname module Unused since: `d6d48b48` ("html parser: use name instead of peeking", 2019-07-22)	2020-03-30 19:31:11 +01:00
Chris Mayo	74d5c68094	Add new tests for URL quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	b7ec71d8cc	Always use utf-8 encoding when quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	5bb4524a63	Update strformat.ascii_safe() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	30df69c158	Improve pretty printed comments	2019-10-05 19:38:57 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Petr Dlouhý	69d426b36f	fix parser encoding tests after change of parser UnicodeDammit input has to be non-unicode to trigger character set detection.	2019-07-22 19:59:37 +01:00
Petr Dlouhý	b5111453d8	change test_parse encoding to UTF-8	2019-07-22 19:59:37 +01:00
Petr Dlouhý	2c3c794e52	fix http test after parser change	2019-07-22 19:59:37 +01:00
Petr Dlouhý	0089349760	fix parser tests after parser change	2019-07-22 19:59:37 +01:00
Petr Dlouhý	d6d48b4814	html parser: use name instead of peeking	2019-07-22 19:59:37 +01:00
Petr Dlouhý	d1844a526e	add charset tests	2019-07-22 19:59:37 +01:00
Chris Mayo	ecd06776ab	Fix TypeError when checking https link and test File "/usr/lib/python3.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds line: res = datetime.strptime(timestr, timeformat + 'Z') locals: res = <local> None datetime = <global> <class 'datetime.datetime'> datetime.strptime = <global> <built-in method strptime of type object at 0x7fa39064dda0> timestr = <local> b'20191106202117Z' timeformat = <local> '%Y%m%d%H%M%S' TypeError: strptime() argument 1 must be str, not bytes pyOpenSSL OpenSSL.crypto.X509.get_notAfter() returns bytes: https://www.pyopenssl.org/en/stable/api/crypto.html#OpenSSL.crypto.X509.get_notAfter	2019-11-11 20:12:25 +00:00
Chris Mayo	dee4be4b1d	Enable https checking using a test server Verification has to be turned off because we are using a self-signed certificate.	2019-11-11 20:12:25 +00:00
Chris Mayo	2f16152dc8	Improve test failure diff Some url lines were missing a url prefix while others had a double url prefix. diff was reporting more url lines as changed than actually had. Improve formatting by removing newlines from control lines and adding headings. Before: E AssertionError: http://localhost:46031/tests/checker/data/sitemap.xml E --- E E +++ E E @@ -1,4 +1,8 @@ E E -url http://localhost:46031/tests/checker/data/sitemap.xml E +http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid E +url url http://localhost:46031/tests/checker/data/sitemap.xml E cache key http://localhost:46031/tests/checker/data/sitemap.xml E real url http://localhost:46031/tests/checker/data/sitemap.xml E valid After: E AssertionError: http://localhost:44021/tests/checker/data/sitemap.xml E --- expected E +++ result E @@ -2,3 +2,7 @@ E cache key http://localhost:44021/tests/checker/data/sitemap.xml E real url http://localhost:44021/tests/checker/data/sitemap.xml E valid E +url http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid	2019-10-29 20:03:08 +00:00
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Marius Gedminas	606ece0308	Explain why these tests are being skipped pytest output before this change: SKIPPED [3] tests/__init__.py:217: condition: True SKIPPED [1] tests/checker/test_news.py:63: condition: True SKIPPED [1] tests/checker/test_news.py:41: condition: True SKIPPED [1] tests/checker/test_news.py:116: condition: True SKIPPED [1] tests/checker/test_news.py:75: condition: True After: SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up	2019-10-23 17:35:31 +03:00
Marius Gedminas	87b504785c	Add a regression test for the sitemap parser	2019-10-23 17:30:10 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	7e94e542b3	Enable clamav integration tests on Travis CI	2019-10-22 17:04:09 +03:00
Marius Gedminas	58b0d5aaae	Fix TypeError: string arg required in content_allows_robots() See #323 an #317.	2019-10-22 14:13:45 +03:00

1 2 3 4 5 ...

478 commits