linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-04 04:44:42 +00:00

Author	SHA1	Message	Date
Chris Mayo	a51f02cf66	Improve error handling and debugging for login form	2020-04-27 18:06:29 +01:00
Chris Mayo	9a33c2a659	Make requesting login form password work on Python 3	2020-04-27 18:06:29 +01:00
Chris Mayo	8fc0dcc055	Make matching login form credentials case-sensitive The keys of the form.data dictionary are case-sensitive and therefore a KeyError was possible if the configured values are not identical to the input element name attributes.	2020-04-27 18:06:29 +01:00
Chris Mayo	7a6ef938cc	Rename htmlutil.formsearch to htmlutil.loginformsearch Make it clear that this module has only one specific use.	2020-04-27 18:06:29 +01:00
anarcat	350f8bfef9	Merge pull request #373 from linkchecker/fix-swf-parsing SWF files are binary data	2020-04-27 09:39:52 -04:00
Marius Gedminas	680783b1ff	SWF files are binary data Should fix #372.	2020-04-27 11:25:37 +03:00
anarcat	183d483074	Merge pull request #365 from cjmayo/tidyten1 Remove use of the future package	2020-04-26 12:02:30 -04:00
Chris Mayo	d189445a8e	LinkFinder does not raise StopParse	2020-04-18 20:30:46 +01:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
Chris Mayo	384e1e196d	Remove Python 2 gettext builtin installation	2020-04-15 19:49:16 +01:00
Chris Mayo	a83fbb56c0	Remove from __future__ imports	2020-04-15 19:49:16 +01:00
Chris Mayo	f5e7f3a382	Remove use of the future package It was providing Python 2 compatibility.	2020-04-15 19:49:16 +01:00
Chris Mayo	0795e3c1b4	Replace Parser class using BeautifulSoup.find_all()	2020-04-10 13:51:09 +01:00
Chris Mayo	eb3cf28baa	Remove support for start_end_element() callback The LinkFinder handler start_end_element() callback does nothing apart from call start_element().	2020-04-10 13:51:09 +01:00
Chris Mayo	c9f17e92b9	Remove support for end_element() callback	2020-04-10 13:51:09 +01:00
Chris Mayo	48b590cf8b	Replace FormFinder using BeautifulSoup.find_all() FormFinder was the only handler that used an end_element() callback and was therefore a blocker to moving the Parser class to use BeautifulSoup.find_all() FormFinder was a specialised handler used to parse a login form at the start of a session if the user had configured authentication credentials.	2020-04-10 13:51:05 +01:00
Chris Mayo	974915cc4f	Remove encoding from Parser Only used by the test and an attribute of the soup object.	2020-04-08 20:03:35 +01:00
Chris Mayo	02e1c389b2	Remove parser flush() and reset() Remnants of the feed() interface.	2020-04-08 20:03:35 +01:00
Chris Mayo	3771dd9136	Use parser.feed_soup() instead of parser.feed() Markup is not being passed in pieces to the parser, so simplify the interface and reduce the state further.	2020-04-08 20:03:35 +01:00
Chris Mayo	40f43ae41c	Create one function to make soup objects	2020-04-08 20:03:35 +01:00
Chris Mayo	9d8d251d06	Replace Parser lineno() and column() methods Stop storing this data in Parser object state.	2020-04-08 20:03:35 +01:00
Chris Mayo	16e6fb2919	Fix incorrect character in FormFinder log message	2020-04-07 19:24:34 +01:00
Chris Mayo	00f940d979	Fix FormFinder callbacks for missing element_text element_text added in: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-04-07 19:24:34 +01:00
Chris Mayo	fe024fb0c8	Remove unused Parser.debug() method	2020-04-03 19:24:08 +01:00
Chris Mayo	0c5e3bb403	Remove old HtmlParser .gitignore htmlparse.output was a product of the built-in parser.	2020-04-03 19:24:08 +01:00
Chris Mayo	036b900ffc	Remove unused linkcheck.containers classes	2020-04-03 19:24:08 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	a7e1e20172	Remove last line and column from Parser Only used for debug log message and not very useful.	2020-04-03 19:24:08 +01:00
Chris Mayo	28701e291a	Remove use of Python 2 unicode() and related u prefixes Several instances for MS Windows left unchanged.	2020-04-01 19:39:50 +01:00
anarcat	cf4e6bb235	Merge pull request #351 from cjmayo/tagsonly Remove support for non-Tag elements from Parser	2020-04-01 12:17:18 -04:00
Chris Mayo	ffa6ac457f	Remove support for non-Tag elements from Parser This change is made because the linkchecker handlers only process Tags. The test HtmlPrettyPrinter handler is updated to output element text because its support for non-Tag elements has been removed. This results in a number of the existing tests still passing.	2020-03-31 20:10:35 +01:00
Chris Mayo	e7c5f353cd	Remove unused function linkcheck.fileutil.write_file() Doesn't appear to have ever been used. Causes flake8 error: linkcheck/fileutil.py:45:9: F821 undefined name 'file'	2020-03-31 19:46:31 +01:00
Chris Mayo	504004d4f0	Use ipaddress in network.iputil.is_valid_ip() ipaddress was introduced in Python 3.3.	2020-03-31 19:46:31 +01:00
Chris Mayo	2eb1424703	Replace deprecated plistlib.readPlistFromBytes() in bookmarks.safari Remove Python 2 code. plistlib.loads() was added in Python 3.4.	2020-03-31 19:46:31 +01:00
Chris Mayo	0ee4414a60	Replace memoized with functools.lru_cache	2020-03-31 19:46:31 +01:00
Chris Mayo	1255119ca8	Move HtmlPrinter and HtmlPrettyPrinter into tests	2020-03-30 19:32:30 +01:00
Chris Mayo	ce1d669329	Remove unused functions from linkcheck.httputil http_persistent() unused since: `4b818cb4` ("Detect more cases to close the connection, and close response objects", 2006-09-15) http_keepalive(), get_content_encoding() unused since: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2020-03-30 19:32:30 +01:00
Chris Mayo	5b66964afa	Remove unused .charset from checker classes Unused since: `4f8c2954` ("Don't set parser.encoding", 2019-10-05)	2020-03-30 19:32:30 +01:00
Chris Mayo	f743be57e8	Remove unused functions from linkcheck.HtmlParser resolve_entities() unused since: `2c000683` ("Remove unused linkcheck.htmlutil.linkname module", 2020-03-30) set_doctype(), set_encoding() unused since: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-03-30 19:32:18 +01:00
Chris Mayo	2c000683e1	Remove unused linkcheck.htmlutil.linkname module Unused since: `d6d48b48` ("html parser: use name instead of peeking", 2019-07-22)	2020-03-30 19:31:11 +01:00
Marius Gedminas	af0f50efa8	Restore support for older BeautifulSoup4 versions	2020-03-30 14:49:56 +03:00
Marius Gedminas	a311ebb97e	Fix doctype tests I don't think linkchecker actually cares about the document type, so I'm not sure why we're even testing this...	2020-03-23 10:56:57 +02:00
Chris Mayo	5eaad24641	Use HTTP header encoding for decoding	2020-03-22 19:54:37 +00:00
Chris Mayo	f5ae90e824	Parser threading lock no longer required with Beautiful Soup	2020-03-22 19:54:37 +00:00
Chris Mayo	b7ec71d8cc	Always use utf-8 encoding when quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	a9f147c347	Update fileutil.pathencode() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	5bb4524a63	Update strformat.ascii_safe() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	153e53ba03	Reuse soup object used for detecting encoding in the HTML parser	2019-10-05 19:38:57 +01:00
Chris Mayo	978042a54e	Hide Beautiful Soup soupsieve warning Shown every time linkchecker is run: /usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used. 'The soupsieve package is not installed. CSS selectors cannot be used.'	2019-10-05 19:38:57 +01:00

1 2 3 4 5 ...

3026 commits