linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-24 22:23:43 +00:00

Author	SHA1	Message	Date
Chris Mayo	4ffdbf2406	Replace MetaRobotsFinder using BeautifulSoup.find()	2020-04-29 20:07:00 +01:00
anarcat	350f8bfef9	Merge pull request #373 from linkchecker/fix-swf-parsing SWF files are binary data	2020-04-27 09:39:52 -04:00
Marius Gedminas	680783b1ff	SWF files are binary data Should fix #372.	2020-04-27 11:25:37 +03:00
anarcat	183d483074	Merge pull request #365 from cjmayo/tidyten1 Remove use of the future package	2020-04-26 12:02:30 -04:00
anarcat	125146fb2c	Merge pull request #361 from cjmayo/parser4 Rename htmlsax.py to htmlsoup.py and add test_content_allows_robots	2020-04-25 17:56:29 -04:00
anarcat	87079312db	Merge pull request #371 from cjmayo/manhtml Switch to mandoc for generating html man pages	2020-04-24 18:59:10 -04:00
Chris Mayo	b7c8ad9be7	Fix typo for -Dplugin in man page	2020-04-24 19:46:30 +01:00
Chris Mayo	5dd448cf05	Add link to unknownurl.py in man page	2020-04-24 19:46:30 +01:00
Chris Mayo	a506800c07	Replace `` in man page with bold formatting	2020-04-24 19:46:30 +01:00
Chris Mayo	e3b77f810e	Update external links in man pages to https	2020-04-24 19:46:30 +01:00
Chris Mayo	a205a3722b	Update man pages to optimise for both html and man - Use "LinkChecker User Manual" as the source for both pages. - .UR/.UE for external links to allow mandoc to create links in html. - Use Linux man-pages format for cross references e.g. .BR linkcheckerrc (5) which are replace in the html by the Makefile.	2020-04-24 19:46:30 +01:00
Chris Mayo	441cda5e15	Switch to mandoc for generating html man pages Removes the need for diff files and is a currently maintained project. Cross references are only supported for mdoc macros but because we only have two pages this can be achieved with sed. A clean target is added to the Makefile to make development easier.	2020-04-24 19:46:30 +01:00
Chris Mayo	56b8c9f7ab	Add tests for <meta name="robots" content="nofollow"> norobots.html was used for testing <meta name="robots" content="nofollow"> in local files until [1]. This commit reinstates local file testing and adds an http test. Checking is reported by checker.httpurl.HttpUrl.content_allows_robots(). [1] `ce733ae7` ("Don't check for robots.txt directives in local html files.", 2014-03-19)	2020-04-18 20:30:46 +01:00
Chris Mayo	d189445a8e	LinkFinder does not raise StopParse	2020-04-18 20:30:46 +01:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
anarcat	0f18c9b8f0	Merge pull request #360 from cjmayo/parser3 Replace Parser class using BeautifulSoup.find_all()	2020-04-18 14:37:03 -04:00
Chris Mayo	384e1e196d	Remove Python 2 gettext builtin installation	2020-04-15 19:49:16 +01:00
Chris Mayo	a83fbb56c0	Remove from __future__ imports	2020-04-15 19:49:16 +01:00
Chris Mayo	f5e7f3a382	Remove use of the future package It was providing Python 2 compatibility.	2020-04-15 19:49:16 +01:00
Chris Mayo	0795e3c1b4	Replace Parser class using BeautifulSoup.find_all()	2020-04-10 13:51:09 +01:00
Chris Mayo	eb3cf28baa	Remove support for start_end_element() callback The LinkFinder handler start_end_element() callback does nothing apart from call start_element().	2020-04-10 13:51:09 +01:00
Chris Mayo	c9f17e92b9	Remove support for end_element() callback	2020-04-10 13:51:09 +01:00
Chris Mayo	48b590cf8b	Replace FormFinder using BeautifulSoup.find_all() FormFinder was the only handler that used an end_element() callback and was therefore a blocker to moving the Parser class to use BeautifulSoup.find_all() FormFinder was a specialised handler used to parse a login form at the start of a session if the user had configured authentication credentials.	2020-04-10 13:51:05 +01:00
anarcat	d80a075372	Merge pull request #357 from cjmayo/parser2 Simplify the Parser class	2020-04-09 15:22:14 -04:00
Chris Mayo	974915cc4f	Remove encoding from Parser Only used by the test and an attribute of the soup object.	2020-04-08 20:03:35 +01:00
Chris Mayo	02e1c389b2	Remove parser flush() and reset() Remnants of the feed() interface.	2020-04-08 20:03:35 +01:00
Chris Mayo	3771dd9136	Use parser.feed_soup() instead of parser.feed() Markup is not being passed in pieces to the parser, so simplify the interface and reduce the state further.	2020-04-08 20:03:35 +01:00
Chris Mayo	40f43ae41c	Create one function to make soup objects	2020-04-08 20:03:35 +01:00
Chris Mayo	9d8d251d06	Replace Parser lineno() and column() methods Stop storing this data in Parser object state.	2020-04-08 20:03:35 +01:00
anarcat	e6374fa73a	Merge pull request #358 from cjmayo/testform Add a test for search_form	2020-04-07 17:37:15 -04:00
Chris Mayo	16e6fb2919	Fix incorrect character in FormFinder log message	2020-04-07 19:24:34 +01:00
Chris Mayo	00f940d979	Fix FormFinder callbacks for missing element_text element_text added in: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-04-07 19:24:34 +01:00
Chris Mayo	514210199d	Add tests for search_form	2020-04-07 19:24:34 +01:00
anarcat	7d55855ffb	Merge pull request #356 from cjmayo/parser1 Remove unecessary parser related code	2020-04-04 09:26:51 -04:00
Chris Mayo	fe024fb0c8	Remove unused Parser.debug() method	2020-04-03 19:24:08 +01:00
Chris Mayo	0c5e3bb403	Remove old HtmlParser .gitignore htmlparse.output was a product of the built-in parser.	2020-04-03 19:24:08 +01:00
Chris Mayo	036b900ffc	Remove unused linkcheck.containers classes	2020-04-03 19:24:08 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	a7e1e20172	Remove last line and column from Parser Only used for debug log message and not very useful.	2020-04-03 19:24:08 +01:00
anarcat	25d517521c	Merge pull request #353 from cjmayo/setup Tidy setup.py for C extensions and Python 2	2020-04-02 10:10:38 -04:00
anarcat	39aa438d06	Merge pull request #354 from cjmayo/unicode Remove use of Python 2 unicode() and related u prefixes	2020-04-02 10:10:31 -04:00
Chris Mayo	28701e291a	Remove use of Python 2 unicode() and related u prefixes Several instances for MS Windows left unchanged.	2020-04-01 19:39:50 +01:00
Chris Mayo	e0bf5fc24f	Remove unused imports and variables from setup.py	2020-04-01 19:21:47 +01:00
Chris Mayo	f6b273d05e	Remove code for compiling C extensions from setup.py C extensions for parser and network utilities have been replaced in Python.	2020-04-01 19:21:47 +01:00
Chris Mayo	9f899605a9	Remove Python 2 compatibility from setup.py sys.version_info was introduced in Python 2.0.	2020-04-01 19:21:47 +01:00
anarcat	cf4e6bb235	Merge pull request #351 from cjmayo/tagsonly Remove support for non-Tag elements from Parser	2020-04-01 12:17:18 -04:00
Marius Gedminas	7c14bf1ad6	Declare supported Python versions in setup.py The python_requires is the important one; it means once we publish a new release on PyPI, pip install will know not to try to install it if you run it on Python 2 and will fall back to an older version.	2020-04-01 17:49:51 +03:00
anarcat	b5c8a5d1ce	Merge pull request #314 from cjmayo/postbs4 Replace memoized with functools.lru_cache and deprecations	2020-04-01 10:28:18 -04:00
Chris Mayo	9fc651e82b	Remove Python 2 compatibility from parser tests	2020-03-31 20:10:35 +01:00
Chris Mayo	ffa6ac457f	Remove support for non-Tag elements from Parser This change is made because the linkchecker handlers only process Tags. The test HtmlPrettyPrinter handler is updated to output element text because its support for non-Tag elements has been removed. This results in a number of the existing tests still passing.	2020-03-31 20:10:35 +01:00

1 2 3 4 5 ...

6246 commits