linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-08 06:34:50 +00:00

Author	SHA1	Message	Date
Chris Mayo	926932411d	Only attempt to get rel attribute from link elements	2023-01-17 19:23:29 +00:00
Chris Mayo	2294160a6a	Fix minimum version of Beautiful Soup increased to 4.11.0 Since: `6d9061b0` ("Ignore bs4 markup and XML parser warnings", 2022-09-02)	2022-11-30 19:21:06 +00:00
Chris Mayo	b6bc366af0	Run pyupgrade --py37-plus x 2	2022-11-08 19:21:29 +00:00
Stefan Fisk	d2b9723612	Fix srcset parsing Resolves #631	2022-09-07 21:24:23 +02:00
Chris Mayo	6d9061b00a	Ignore bs4 markup and XML parser warnings XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. MarkupResemblesLocatorWarning: The input looks more like a filename than markup. MarkupResemblesLocatorWarning: The input looks more like a URL than markup.	2022-09-02 19:29:11 +01:00
Koen Van den Wijngaert	900586dc01	Better handling for link rel dns-prefetch and add preconnect support (#536 ) preconnect is only DNS checked. This is allowed even in the Resource Hints Editor's Draft https://w3c.github.io/resource-hints/#preconnect	2021-12-09 19:38:30 +00:00
Chris Mayo	27f22ae17a	Fix treating data: URIs in srcset values as links	2020-08-07 20:04:23 +01:00
Chris Mayo	7ba4053710	Fix critical exception if srcset value ends with a comma Log a debug message as this is a minor syntax problem, won't stop LinkChecker parsing strings up to the comma.	2020-08-07 20:04:23 +01:00
Chris Mayo	2f51a9dca0	Improve documentation of authentication	2020-06-23 17:28:31 +01:00
Chris Mayo	a92a684ac4	Run black on linkcheck/	2020-05-30 17:01:36 +01:00
Chris Mayo	a127902607	Replace str_text in asserts	2020-05-19 19:56:42 +01:00
Chris Mayo	a15a2833ca	Remove spaces after names in class method definitions And also nested functions. This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	1663e10fe7	Remove spaces after names in function definitions This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	08ddf658bc	Merge pull request #366 from cjmayo/userorpwd Support login forms with user and/or password	2020-05-13 19:37:44 +01:00
Chris Mayo	736c893707	Merge pull request #377 from cjmayo/tidyten3 Remove u string prefixes	2020-05-13 19:36:54 +01:00
Chris Mayo	3ace021264	Support login forms with user and/or password	2020-05-13 19:32:25 +01:00
Chris Mayo	44e81d27dd	Remove inheriting object All Python 3 classes are new-style.	2020-05-08 10:45:31 +01:00
Chris Mayo	b0ea72e8c1	Remove # -*- coding: lines Except for tests that include non-unicode characters: tests/test_po.py tests/test_strformat.py tests/test_url.py tests/checker/test_error.py tests/checker/test_news.py	2020-05-08 10:45:31 +01:00
Chris Mayo	4d3e5abcfa	Remove u string prefixes	2020-04-30 20:11:59 +01:00
anarcat	ab476fa4bf	Merge pull request #364 from cjmayo/parser5 Stop using HTML handlers and improve login form error handling	2020-04-30 09:28:48 -04:00
Chris Mayo	12a948894b	Fix space style in linkcheck/htmlutil/linkparse.py	2020-04-29 20:07:00 +01:00
Chris Mayo	9eed070a73	Stop using HTML handlers LinkFinder is the only remaining HTML handler therefore no need for htmlsoup.process_soup() as an independent function or TagFinder as a base class.	2020-04-29 20:07:00 +01:00
Chris Mayo	4ffdbf2406	Replace MetaRobotsFinder using BeautifulSoup.find()	2020-04-29 20:07:00 +01:00
Chris Mayo	a51f02cf66	Improve error handling and debugging for login form	2020-04-27 18:06:29 +01:00
Chris Mayo	8fc0dcc055	Make matching login form credentials case-sensitive The keys of the form.data dictionary are case-sensitive and therefore a KeyError was possible if the configured values are not identical to the input element name attributes.	2020-04-27 18:06:29 +01:00
Chris Mayo	7a6ef938cc	Rename htmlutil.formsearch to htmlutil.loginformsearch Make it clear that this module has only one specific use.	2020-04-27 18:06:29 +01:00
Marius Gedminas	680783b1ff	SWF files are binary data Should fix #372.	2020-04-27 11:25:37 +03:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
Chris Mayo	eb3cf28baa	Remove support for start_end_element() callback The LinkFinder handler start_end_element() callback does nothing apart from call start_element().	2020-04-10 13:51:09 +01:00
Chris Mayo	48b590cf8b	Replace FormFinder using BeautifulSoup.find_all() FormFinder was the only handler that used an end_element() callback and was therefore a blocker to moving the Parser class to use BeautifulSoup.find_all() FormFinder was a specialised handler used to parse a login form at the start of a session if the user had configured authentication credentials.	2020-04-10 13:51:05 +01:00
Chris Mayo	02e1c389b2	Remove parser flush() and reset() Remnants of the feed() interface.	2020-04-08 20:03:35 +01:00
Chris Mayo	3771dd9136	Use parser.feed_soup() instead of parser.feed() Markup is not being passed in pieces to the parser, so simplify the interface and reduce the state further.	2020-04-08 20:03:35 +01:00
Chris Mayo	9d8d251d06	Replace Parser lineno() and column() methods Stop storing this data in Parser object state.	2020-04-08 20:03:35 +01:00
Chris Mayo	16e6fb2919	Fix incorrect character in FormFinder log message	2020-04-07 19:24:34 +01:00
Chris Mayo	00f940d979	Fix FormFinder callbacks for missing element_text element_text added in: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-04-07 19:24:34 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	a7e1e20172	Remove last line and column from Parser Only used for debug log message and not very useful.	2020-04-03 19:24:08 +01:00
Chris Mayo	28701e291a	Remove use of Python 2 unicode() and related u prefixes Several instances for MS Windows left unchanged.	2020-04-01 19:39:50 +01:00
Chris Mayo	2c000683e1	Remove unused linkcheck.htmlutil.linkname module Unused since: `d6d48b48` ("html parser: use name instead of peeking", 2019-07-22)	2020-03-30 19:31:11 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Chris Mayo	4f8c2954cf	Don't set parser.encoding Read-only property with new Beautiful Soup parser.	2019-10-05 19:38:57 +01:00
Petr Dlouhý	d6d48b4814	html parser: use name instead of peeking	2019-07-22 19:59:37 +01:00
Petr Dlouhý	51a06d8a1e	Remove home-cooked htmlparser and use BeautifulSoup	2019-07-22 19:59:37 +01:00
anarcat	7cfb1136e9	Merge pull request #313 from cjmayo/titlefinder Remove unused linkparse.TitleFinder	2019-10-07 11:30:10 -04:00
Chris Mayo	127c2272c4	Remove unused linkparse.TitleFinder Stopped being used with removal of UrlBase.set_title_from_content() in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2019-10-05 19:43:33 +01:00
Chris Mayo	5732606c58	Remove urlutil.decode_for_unquote() Not needed since all content is now being decoded on retrieval. Added by: `a6643034` ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)	2019-10-04 19:37:09 +01:00
anarcat	8c072fa757	Merge pull request #289 from cjmayo/python3_38 {python3_38} Python3: fix linkname.py	2019-09-12 08:39:29 -04:00
Petr Dlouhý	538c4cfeb9	Python3: fix linkname.py	2019-09-11 20:32:33 +01:00
Petr Dlouhý	e10f25b968	fixes for Python 3: fix running problems in Python 3	2019-09-10 19:30:09 +01:00
Petr Dlouhý	2c6411d68e	Python3: fix regexp format	2019-04-17 19:50:06 +01:00

1 2

85 commits