Commit graph

6220 commits

Author SHA1 Message Date
Chris Mayo
3771dd9136 Use parser.feed_soup() instead of parser.feed()
Markup is not being passed in pieces to the parser, so simplify the
interface and reduce the state further.
2020-04-08 20:03:35 +01:00
Chris Mayo
40f43ae41c Create one function to make soup objects 2020-04-08 20:03:35 +01:00
Chris Mayo
9d8d251d06 Replace Parser lineno() and column() methods
Stop storing this data in Parser object state.
2020-04-08 20:03:35 +01:00
anarcat
e6374fa73a
Merge pull request #358 from cjmayo/testform
Add a test for search_form
2020-04-07 17:37:15 -04:00
Chris Mayo
16e6fb2919 Fix incorrect character in FormFinder log message 2020-04-07 19:24:34 +01:00
Chris Mayo
00f940d979 Fix FormFinder callbacks for missing element_text
element_text added in:
51a06d8a ("Remove home-cooked htmlparser and use BeautifulSoup",
2019-07-22)
2020-04-07 19:24:34 +01:00
Chris Mayo
514210199d Add tests for search_form 2020-04-07 19:24:34 +01:00
anarcat
7d55855ffb
Merge pull request #356 from cjmayo/parser1
Remove unecessary parser related code
2020-04-04 09:26:51 -04:00
Chris Mayo
fe024fb0c8 Remove unused Parser.debug() method 2020-04-03 19:24:08 +01:00
Chris Mayo
0c5e3bb403 Remove old HtmlParser .gitignore
htmlparse.output was a product of the built-in parser.
2020-04-03 19:24:08 +01:00
Chris Mayo
036b900ffc Remove unused linkcheck.containers classes 2020-04-03 19:24:08 +01:00
Chris Mayo
3ff3d72492 Use BeautifulSoup element attrs directly 2020-04-03 19:24:08 +01:00
Chris Mayo
a7e1e20172 Remove last line and column from Parser
Only used for debug log message and not very useful.
2020-04-03 19:24:08 +01:00
anarcat
25d517521c
Merge pull request #353 from cjmayo/setup
Tidy setup.py for C extensions and Python 2
2020-04-02 10:10:38 -04:00
anarcat
39aa438d06
Merge pull request #354 from cjmayo/unicode
Remove use of Python 2 unicode() and related u prefixes
2020-04-02 10:10:31 -04:00
Chris Mayo
28701e291a Remove use of Python 2 unicode() and related u prefixes
Several instances for MS Windows left unchanged.
2020-04-01 19:39:50 +01:00
Chris Mayo
e0bf5fc24f Remove unused imports and variables from setup.py 2020-04-01 19:21:47 +01:00
Chris Mayo
f6b273d05e Remove code for compiling C extensions from setup.py
C extensions for parser and network utilities have been replaced in
Python.
2020-04-01 19:21:47 +01:00
Chris Mayo
9f899605a9 Remove Python 2 compatibility from setup.py
sys.version_info was introduced in Python 2.0.
2020-04-01 19:21:47 +01:00
anarcat
cf4e6bb235
Merge pull request #351 from cjmayo/tagsonly
Remove support for non-Tag elements from Parser
2020-04-01 12:17:18 -04:00
Marius Gedminas
7c14bf1ad6 Declare supported Python versions in setup.py
The python_requires is the important one; it means once we publish a
new release on PyPI, pip install will know not to try to install it if
you run it on Python 2 and will fall back to an older version.
2020-04-01 17:49:51 +03:00
anarcat
b5c8a5d1ce
Merge pull request #314 from cjmayo/postbs4
Replace memoized with functools.lru_cache and deprecations
2020-04-01 10:28:18 -04:00
Chris Mayo
9fc651e82b Remove Python 2 compatibility from parser tests 2020-03-31 20:10:35 +01:00
Chris Mayo
ffa6ac457f Remove support for non-Tag elements from Parser
This change is made because the linkchecker handlers only process
Tags.

The test HtmlPrettyPrinter handler is updated to output element text
because its support for non-Tag elements has been removed. This results
in a number of the existing tests still passing.
2020-03-31 20:10:35 +01:00
Chris Mayo
d2cb1b9dd6 Raise minimum Python version to 3.5 in setup.py 2020-03-31 19:46:31 +01:00
Chris Mayo
e7c5f353cd Remove unused function linkcheck.fileutil.write_file()
Doesn't appear to have ever been used.

Causes flake8 error:
linkcheck/fileutil.py:45:9: F821 undefined name 'file'
2020-03-31 19:46:31 +01:00
Chris Mayo
c3860e2218 Remove third_party directory from MANIFEST.in
Unused since:
0a13fae3 ("remove third party packages and use them as dependency",
2018-01-06)
2020-03-31 19:46:31 +01:00
Chris Mayo
504004d4f0 Use ipaddress in network.iputil.is_valid_ip()
ipaddress was introduced in Python 3.3.
2020-03-31 19:46:31 +01:00
Chris Mayo
2eb1424703 Replace deprecated plistlib.readPlistFromBytes() in bookmarks.safari
Remove Python 2 code.

plistlib.loads() was added in Python 3.4.
2020-03-31 19:46:31 +01:00
Chris Mayo
0ee4414a60 Replace memoized with functools.lru_cache 2020-03-31 19:46:31 +01:00
Marius Gedminas
61b30a95dd Switch to travis-ci.com
Migrating from legacy GitHub services/webhooks to the new Travis CI
GitHub app means we also have to use travis-ci.com instead of
travis-ci.org to see build status or history.
2020-03-31 18:35:37 +03:00
anarcat
67f91fee54
Merge pull request #349 from cjmayo/unused
Remove unused code
2020-03-31 11:20:31 -04:00
Chris Mayo
1255119ca8 Move HtmlPrinter and HtmlPrettyPrinter into tests 2020-03-30 19:32:30 +01:00
Chris Mayo
ce1d669329 Remove unused functions from linkcheck.httputil
http_persistent() unused since:
4b818cb4 ("Detect more cases to close the connection, and close response
objects", 2006-09-15)

http_keepalive(), get_content_encoding() unused since:
7b34be59 ("Introduce check plugins, use Python requests for http/s
connections, and some code cleanups and improvements.", 2014-03-01)
2020-03-30 19:32:30 +01:00
Chris Mayo
5b66964afa Remove unused .charset from checker classes
Unused since:
4f8c2954 ("Don't set parser.encoding", 2019-10-05)
2020-03-30 19:32:30 +01:00
Chris Mayo
f743be57e8 Remove unused functions from linkcheck.HtmlParser
resolve_entities() unused since:
2c000683 ("Remove unused linkcheck.htmlutil.linkname module",
2020-03-30)

set_doctype(), set_encoding() unused since:
51a06d8a ("Remove home-cooked htmlparser and use BeautifulSoup",
2019-07-22)
2020-03-30 19:32:18 +01:00
Chris Mayo
2c000683e1 Remove unused linkcheck.htmlutil.linkname module
Unused since:
d6d48b48 ("html parser: use name instead of peeking", 2019-07-22)
2020-03-30 19:31:11 +01:00
Marius Gedminas
78530956a1
Merge pull request #337 from linkchecker/htmlparser-beautifulsoup
Change HtmlParser to use Beautiful Soup
2020-03-30 20:45:14 +03:00
Chris Mayo
9030050599 Remove Python 3 status document 2020-03-30 17:39:23 +01:00
Marius Gedminas
af0f50efa8 Restore support for older BeautifulSoup4 versions 2020-03-30 14:49:56 +03:00
Marius Gedminas
ccc0ee0464 Clean up travis and tox.ini
I want the Python 3.5 travis job to run just tox -e py35, without the
oldbs4 job, and without an explicit TOXENV setting that is awkward to
insert in the .travis.yml (also, it reorders the jobs putting 3.5 below
3.8 which annoys me).

I think I found a way of doing that by renaming py35-oldbs4 to oldbs4.
2020-03-30 14:46:44 +03:00
Marius Gedminas
ed08e7fa7e Split the oldbs4 into a separate Travis job (take 3)
I did an oopsie whoopsie with the YAML syntax in my previous commit.
2020-03-23 16:50:27 +02:00
Marius Gedminas
894f0b0922 Split the oldbs4 into a separate Travis job (take 2)
The previous attempt did not work: the 3.5 build ran both toxenvs.
2020-03-23 16:45:46 +02:00
Marius Gedminas
ba5888f06a Split the oldbs4 into a separate Travis job 2020-03-23 16:40:22 +02:00
Marius Gedminas
0417f677c2 Ignore files created during test runs 2020-03-23 11:05:13 +02:00
Marius Gedminas
6a50fe9d86 Add Python 3.8 to the build matrix 2020-03-23 11:00:25 +02:00
Marius Gedminas
a311ebb97e Fix doctype tests
I don't think linkchecker actually cares about the document type, so I'm
not sure why we're even testing this...
2020-03-23 10:56:57 +02:00
Chris Mayo
5eaad24641 Use HTTP header encoding for decoding 2020-03-22 19:54:37 +00:00
Chris Mayo
f5ae90e824 Parser threading lock no longer required with Beautiful Soup 2020-03-22 19:54:37 +00:00
Marius Gedminas
205ceb6805
Merge pull request #344 from hroncok/beautifulsoup4-requirement
Require beautifulsoup4 instead of bs4
2020-02-06 12:52:20 +02:00