linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-04-06 15:50:58 +00:00

Author	SHA1	Message	Date
Chris Mayo	4ffdbf2406	Replace MetaRobotsFinder using BeautifulSoup.find()	2020-04-29 20:07:00 +01:00
Marius Gedminas	680783b1ff	SWF files are binary data Should fix #372.	2020-04-27 11:25:37 +03:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
Chris Mayo	eb3cf28baa	Remove support for start_end_element() callback The LinkFinder handler start_end_element() callback does nothing apart from call start_element().	2020-04-10 13:51:09 +01:00
Chris Mayo	48b590cf8b	Replace FormFinder using BeautifulSoup.find_all() FormFinder was the only handler that used an end_element() callback and was therefore a blocker to moving the Parser class to use BeautifulSoup.find_all() FormFinder was a specialised handler used to parse a login form at the start of a session if the user had configured authentication credentials.	2020-04-10 13:51:05 +01:00
Chris Mayo	02e1c389b2	Remove parser flush() and reset() Remnants of the feed() interface.	2020-04-08 20:03:35 +01:00
Chris Mayo	3771dd9136	Use parser.feed_soup() instead of parser.feed() Markup is not being passed in pieces to the parser, so simplify the interface and reduce the state further.	2020-04-08 20:03:35 +01:00
Chris Mayo	9d8d251d06	Replace Parser lineno() and column() methods Stop storing this data in Parser object state.	2020-04-08 20:03:35 +01:00
Chris Mayo	16e6fb2919	Fix incorrect character in FormFinder log message	2020-04-07 19:24:34 +01:00
Chris Mayo	00f940d979	Fix FormFinder callbacks for missing element_text element_text added in: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-04-07 19:24:34 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	a7e1e20172	Remove last line and column from Parser Only used for debug log message and not very useful.	2020-04-03 19:24:08 +01:00
Chris Mayo	28701e291a	Remove use of Python 2 unicode() and related u prefixes Several instances for MS Windows left unchanged.	2020-04-01 19:39:50 +01:00
Chris Mayo	2c000683e1	Remove unused linkcheck.htmlutil.linkname module Unused since: `d6d48b48` ("html parser: use name instead of peeking", 2019-07-22)	2020-03-30 19:31:11 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Chris Mayo	4f8c2954cf	Don't set parser.encoding Read-only property with new Beautiful Soup parser.	2019-10-05 19:38:57 +01:00
Petr Dlouhý	d6d48b4814	html parser: use name instead of peeking	2019-07-22 19:59:37 +01:00
Petr Dlouhý	51a06d8a1e	Remove home-cooked htmlparser and use BeautifulSoup	2019-07-22 19:59:37 +01:00
anarcat	7cfb1136e9	Merge pull request #313 from cjmayo/titlefinder Remove unused linkparse.TitleFinder	2019-10-07 11:30:10 -04:00
Chris Mayo	127c2272c4	Remove unused linkparse.TitleFinder Stopped being used with removal of UrlBase.set_title_from_content() in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2019-10-05 19:43:33 +01:00
Chris Mayo	5732606c58	Remove urlutil.decode_for_unquote() Not needed since all content is now being decoded on retrieval. Added by: `a6643034` ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)	2019-10-04 19:37:09 +01:00
anarcat	8c072fa757	Merge pull request #289 from cjmayo/python3_38 {python3_38} Python3: fix linkname.py	2019-09-12 08:39:29 -04:00
Petr Dlouhý	538c4cfeb9	Python3: fix linkname.py	2019-09-11 20:32:33 +01:00
Petr Dlouhý	e10f25b968	fixes for Python 3: fix running problems in Python 3	2019-09-10 19:30:09 +01:00
Petr Dlouhý	2c6411d68e	Python3: fix regexp format	2019-04-17 19:50:06 +01:00
Antoine Beaupré	71be9b941b	fix incorrect call to the logging module (Closes: #847208 )	2017-11-03 09:47:01 -04:00
Marius Gedminas	743a5f31cb	Crawl HTML attributes in deterministic order Fixes #17.	2017-02-01 19:19:53 +02:00
Bastian Kleineidam	35eb30432e	Added some Python3 fixes.	2014-09-12 19:36:30 +02:00
Bastian Kleineidam	85dadc1f1a	Add documentation	2014-07-16 07:37:19 +02:00
Bastian Kleineidam	90257a1b5e	Replace twill with custom code.	2014-07-15 18:37:05 +02:00
Bastian Kleineidam	176b95a30e	Do not strip quotes from resolved URLs.	2014-07-11 00:43:46 +02:00
Bastian Kleineidam	82dd76b0d7	Add PDF link parsing.	2014-04-28 18:13:45 +02:00
Bastian Kleineidam	981079c041	Support itemtype attribute parsing.	2014-04-23 22:03:20 +02:00
Bastian Kleineidam	4232b69633	Support <img> srcset attribute parsing.	2014-04-10 17:51:59 +02:00
Bastian Kleineidam	9c5693ad41	Add doc and copyright.	2014-03-30 19:23:42 +02:00
Bastian Kleineidam	b6b5c7a12e	Simpler link parsing routine.	2014-03-27 19:49:17 +01:00
Bastian Kleineidam	81da2eb48f	Code cleanup	2014-03-27 17:19:52 +01:00
Bastian Kleineidam	7b34be590b	Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.	2014-03-01 00:12:34 +01:00
Bastian Kleineidam	c806be5c15	Updated copyright	2014-01-08 22:33:04 +01:00
Bastian Kleineidam	78ed1e9e52	Do not GET on POST forms.	2013-12-10 23:42:43 +01:00
Bastian Kleineidam	9b8cb67d78	Updated copyright.	2013-01-17 20:41:47 +01:00
Bastian Kleineidam	4dad2aa33c	Support dns-prefetch URLs.	2013-01-17 20:41:09 +01:00
Bastian Kleineidam	ecef16b2c9	Support WML sites.	2012-08-22 22:43:14 +02:00
Bastian Kleineidam	b550a9dcb5	Updated copyright.	2012-06-23 14:31:11 +02:00
Bastian Kleineidam	363ccc0121	Check <object codebase=...> as normal URL.	2012-06-23 14:28:32 +02:00
Bastian Kleineidam	cdf6b91b39	Don't use <object codebase=...> attribute as parent url.	2012-06-23 13:32:08 +02:00
Bastian Kleineidam	fb979b4f3c	Add test for archive attribute support.	2011-12-30 12:36:22 +01:00
Bastian Kleineidam	d06c43d470	Split comma-separated archive attribute values.	2011-12-30 08:58:45 +01:00
Bastian Kleineidam	4a4985a960	Add HTML5 link elements and attributes.	2011-12-30 08:55:38 +01:00
Bastian Kleineidam	a1f0867c74	Updated copyright	2011-05-06 20:27:36 +02:00

1 2

60 commits