Chris Mayo
7c2036b68c
Drop support for Beautiful Soup < 4.8.1
...
The minimum version supported was already 4.8.0 because of the use
of multi_valued_attributes [1].
Test support for < 4.8.1 is the only code that needs removing [2].
[1] 3ff3d724 ("Use BeautifulSoup element attrs directly", 2020-04-03)
[2] 607328d5 ("Support Beautiful Soup line numbers", 2019-10-05)
2021-01-28 19:20:24 +00:00
Chris Mayo
e922dd0224
Stop using biplist
...
plistlib has supported binary files since Python 3.4.
2020-10-12 19:55:46 +01:00
Chris Mayo
e75c4b3d36
Reuse linkcheck.bookmarks.safari.has_biplist in tests
2020-09-23 19:38:17 +01:00
Chris Mayo
9891fc3f70
Python 3.9 adds suport for HTTP status code 103 EARLY_HINTS
2020-09-14 19:55:05 +01:00
Chris Mayo
f268b95cf8
biplist is not compatible with Python 3.9
...
File ".tox/py39/lib/python3.9/site-packages/biplist/__init__.py", line 143, in readPlist
line: raise InvalidPlistException(e)
locals:
InvalidPlistException = <global> <class 'biplist.InvalidPlistException'>
e = <not found>
InvalidPlistException: module 'plistlib' has no attribute 'Data'
2020-09-14 19:55:05 +01:00
Chris Mayo
b1faef93c3
Merge pull request #495 from cjmayo/mswindows
...
MS Windows Python 3.7 and MS Store compatibility
2020-09-01 19:46:44 +01:00
Chris Mayo
314ec085a3
Merge pull request #462 from cjmayo/anchor
...
Fix anchor checking
2020-09-01 19:39:29 +01:00
Chris Mayo
89613d56f2
Replace the use of Python internal test.support
...
Its use is discourged and it is not present in the MS Store version of
Python.
2020-08-29 16:57:57 +01:00
Chris Mayo
1390c9cd7e
Merge pull request #489 from cjmayo/urlsplit
...
Replace deprecated urllib.parse.split functions
2020-08-29 16:44:56 +01:00
Chris Mayo
47604e7d34
Merge pull request #481 from cjmayo/failures
...
Rename blacklist to failures
2020-08-29 16:39:24 +01:00
Chris Mayo
7dfba766a9
Merge pull request #486 from cjmayo/url
...
Remove unused code from url.py
2020-08-26 19:28:50 +01:00
Chris Mayo
2de25d54fd
Rename blacklist to failures
...
Continue to support blacklist for the time being, with deprecation
warnings.
2020-08-23 17:19:26 +01:00
Chris Mayo
737c61cd67
Merge pull request #484 from cjmayo/issuetests
...
Tests of img srcset and invalid host name
2020-08-22 16:32:03 +01:00
Chris Mayo
f99f15c349
Add a test for UrlBase.build_url()
2020-08-22 16:28:53 +01:00
Chris Mayo
d58b3ab285
Remove unused url.url_fix_common_typos()
2020-08-18 19:57:46 +01:00
Chris Mayo
71ea78382b
Remove unused url.safe_host_pattern()
2020-08-18 19:57:46 +01:00
Chris Mayo
794efd6d44
Remove unused url.is_duplicate_content_url()
2020-08-18 19:57:46 +01:00
Chris Mayo
e372657fb8
Remove unused url.get_content()
2020-08-18 19:57:46 +01:00
Chris Mayo
e4ba9c84ce
Remove unused url.match_{host,url}()
...
Removes deprecation warnings for urllib.parse.split{host,type}() in
url_split()
2020-08-18 19:57:46 +01:00
Chris Mayo
4ad20d7f03
Merge pull request #477 from cjmayo/sitemap
...
Detect sitemaps that do not start with an XML declaration
2020-08-18 19:51:32 +01:00
Chris Mayo
24c2f4ac39
Add test for invalid host name in content
...
Tests code added in:
d5690203 ("Fix critical exception when parsing a URL with a ]", 2020-08-08)
2020-08-15 17:04:41 +01:00
Chris Mayo
88c84364b3
Add additional tests for <img srcset>
...
Tests code added in:
7ba40537 ("Fix critical exception if srcset value ends with a comma", 2020-08-07)
27f22ae1 ("Fix treating data: URIs in srcset values as links", 2020-08-07)
2020-08-15 17:04:41 +01:00
Chris Mayo
8c804c35a5
Detect sitemaps that do not start with an XML declaration
2020-08-11 19:35:56 +01:00
Chris Mayo
40b2ebff8f
Remove defaults from lc_cgi.checklink()
...
Only called from application() with arguments. Causes local environment
to be embedded in documentation when using Sphinx autodoc.
2020-08-05 19:54:56 +01:00
Chris Mayo
a7eacd6200
Add a test for a page with links to anchors
...
Query and fragment URL parts for filesystem URLs are ignored, therefore
test over http.
2020-07-27 19:22:32 +01:00
Chris Mayo
10170b2966
Add a test for the LocationInfo plugin
...
Because the GeoIP database now requires registration to download the
result of the lookup using geoip-database is not going to change.
2020-07-07 17:25:28 +01:00
Chris Mayo
d91a328224
Remove strformat.unicode_safe() and strformat.url_unicode_split()
...
All strings support Unicode in Python 3.
2020-07-07 17:25:28 +01:00
Chris Mayo
d66e64460c
Remove unused code from strformat.py
2020-06-18 19:31:00 +01:00
Chris Mayo
18d6eeae76
Ensure PO files are opened as UTF-8 in test_gtranslator()
2020-06-09 19:47:24 +01:00
Chris Mayo
74d449f8ac
Test po files as strings and check po files have been found
2020-06-05 16:59:46 +01:00
Chris Mayo
4330b8a59e
Replace codecs.open() with open()
2020-06-05 16:59:46 +01:00
Chris Mayo
d591fedb60
Remove unused updater code that supports linkchecker-gui
...
pip provides update support for linkchecker.
2020-06-05 16:05:25 +01:00
Chris Mayo
a6b1eb45b1
Convert to Python 3 super()
2020-06-03 20:06:36 +01:00
Chris Mayo
5df8aa085c
Convert space-separated strings in tests/
2020-05-29 19:40:46 +01:00
Chris Mayo
c71cfcbea4
Tidy TestClamav.testInfected() acceptable_responses
2020-05-29 19:40:46 +01:00
Chris Mayo
5ee8d8e1ea
Add trailing comma to single dict list in TestLoginUrl.visit_loginurl()
2020-05-29 19:40:46 +01:00
Chris Mayo
a534be0b50
Remove unnecessary character match in regexp in TestLogger.normalize()
2020-05-29 19:40:46 +01:00
Chris Mayo
be53c4a659
Remove unnecessary commas before closing brackets in tests/
2020-05-29 19:40:46 +01:00
Chris Mayo
87039913b2
Fix remaining flake8 violations in tests/
...
tests/test_clamav.py:58:89: E501 line too long (90 > 88 characters)
tests/test_containers.py:38:9: F841 local variable 'dummy' is assigned to but never used
tests/test_dummy.py:35:9: F841 local variable 'dummy' is assigned to but never used
tests/test_ftpparse.py:94:89: E501 line too long (96 > 88 characters)
tests/test_url.py:128:89: E501 line too long (130 > 88 characters)
tests/test_strformat.py:62:9: E741 ambiguous variable name 'l'
tests/test_strformat.py:136:9: E731 do not assign a lambda expression, use a def
tests/checker/ftpserver.py:94:9: E722 do not use bare 'except'
tests/checker/httpserver.py:55:39: E231 missing whitespace after ','
tests/checker/httpserver.py:224:9: E722 do not use bare 'except'
tests/checker/telnetserver.py:84:9: E722 do not use bare 'except'
tests/checker/__init__.py:71:89: E501 line too long (119 > 88 characters)
tests/checker/__init__.py:292:13: E741 ambiguous variable name 'l'
tests/checker/test_http_misc.py:30:1: W293 blank line contains whitespace
tests/checker/test_https.py:21:1: F401 'tests.need_network' imported but unused
tests/checker/test_news.py:35:1: E302 expected 2 blank lines, found 1
2020-05-28 20:29:13 +01:00
Chris Mayo
165c51aeea
Run black on tests/
2020-05-28 20:29:13 +01:00
Chris Mayo
6f126a54d2
Add coverage for parser.sitemap.parse_sitemapindex()
2020-05-27 20:02:03 +01:00
Chris Mayo
f6e182f0e4
Mark TestFile.test_html_url_quote as need_network
...
Else without the internet the test fails, eventually, with:
warning No MX mail host for users.sourceforge.net found
2020-05-25 19:55:28 +01:00
Chris Mayo
d3c9618b1b
TestHttps.test_https doesn't need the internet now
...
A result of changes introduced in:
dee4be4b ("Enable https checking using a test server", 2019-11-11)
2020-05-25 19:55:28 +01:00
Chris Mayo
32689ea230
Enable as many TestHttp html tests as possible without the internet
2020-05-25 19:55:28 +01:00
Chris Mayo
313a14ff0d
Remove instances of Python 2 unicode
2020-05-24 19:14:47 +01:00
Marius Gedminas
d0169c46d4
Merge pull request #348 from weshaggard/HandleRateLimiting
...
Turn status code 429 into warning instead of failure
2020-05-24 16:16:56 +03:00
Chris Mayo
d611564cb0
Add a test for an empty html file accessed over http
2020-05-23 20:01:24 +01:00
Marius Gedminas
f268a90cfb
Merge branch 'master' into HandleRateLimiting
2020-05-23 14:15:52 +03:00
Marius Gedminas
5bd1fb4e36
Fix internal error on empty HTML files
...
When BeautifulSoup finds an empty file on disk, it sets
original_encoding to None. It doesn't matter what encoding we pick for
empty files, so let's just pick one.
I don't know if there are any circumstances where BeautifulSoup might
set the encoding to None for a non-empty file.
Closes #392 .
2020-05-21 19:01:33 +03:00
Chris Mayo
96e1c00ff7
TestLogger diff output is all Unicode in Python 3
2020-05-20 19:58:44 +01:00
Chris Mayo
71eaf9a982
Remove str_text from tests/
2020-05-19 19:56:42 +01:00
Chris Mayo
a127902607
Replace str_text in asserts
2020-05-19 19:56:42 +01:00
Chris Mayo
12fd59057e
Remove duplicate tests from test_strformat.py
2020-05-17 20:10:28 +01:00
Chris Mayo
339d293326
Convert tests/test_po.py to UTF-8
2020-05-17 20:10:28 +01:00
Chris Mayo
04465530c4
Use HttpServerTest.get_url()
2020-05-17 20:10:28 +01:00
Chris Mayo
58dbe1f282
Remove unused import pytest from tests/checker/test_http.py
...
pytest.mark.xfail() removed in:
743a5f31 ("Crawl HTML attributes in deterministic order", 2017-02-01)
2020-05-17 20:10:28 +01:00
Chris Mayo
79eafee826
Add a test for VirusCheck
2020-05-17 19:04:49 +01:00
Chris Mayo
a15a2833ca
Remove spaces after names in class method definitions
...
And also nested functions.
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
1663e10fe7
Remove spaces after names in function definitions
...
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
fc11d08968
Remove spaces after names in class definitions
2020-05-16 20:19:42 +01:00
Chris Mayo
1416a08119
On Python 3 no need to convert os.linesep to a string
2020-05-16 17:02:01 +01:00
Chris Mayo
10552a79c7
Remove LinkCheckTest.fail_unicode()
...
No need to encode Python 3 strings before output.
2020-05-16 17:02:00 +01:00
Chris Mayo
9f95d06a39
Remove Python 2 test.test_support import
2020-05-16 16:26:38 +01:00
Chris Mayo
f8c9faec1b
Remove Python 2 cStringIO imports
2020-05-15 19:37:04 +01:00
Chris Mayo
bda9612273
Make html.escape Python 3 only
2020-05-14 20:15:28 +01:00
Chris Mayo
42de609f8e
Make urllib imports Python 3 only
2020-05-14 20:15:28 +01:00
Chris Mayo
08ddf658bc
Merge pull request #366 from cjmayo/userorpwd
...
Support login forms with user and/or password
2020-05-13 19:37:44 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
...
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
00c4a30386
Add user and password only loginurl tests
2020-05-13 19:32:29 +01:00
Chris Mayo
31a9f68c46
Merge pull request #367 from cjmayo/loginurl
...
Add test for loginurl
2020-05-12 20:08:57 +01:00
Chris Mayo
44e81d27dd
Remove inheriting object
...
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1
Remove # -*- coding: lines
...
Except for tests that include non-unicode characters:
tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Chris Mayo
4d3e5abcfa
Remove u string prefixes
2020-04-30 20:11:59 +01:00
anarcat
ab476fa4bf
Merge pull request #364 from cjmayo/parser5
...
Stop using HTML handlers and improve login form error handling
2020-04-30 09:28:48 -04:00
Chris Mayo
1d1d9c3bde
Add testing for variants of the robots meta directive
2020-04-29 20:14:10 +01:00
Chris Mayo
9eed070a73
Stop using HTML handlers
...
LinkFinder is the only remaining HTML handler therefore no need for
htmlsoup.process_soup() as an independent function or TagFinder as a
base class.
2020-04-29 20:07:00 +01:00
Chris Mayo
a1433767e5
Replace HtmlPrettyPrinter with pretty_print_html()
2020-04-29 20:07:00 +01:00
Chris Mayo
0361d9e0e8
Remove encoding and default fd from HtmlPrettyPrinter
...
Neither are used.
2020-04-29 20:07:00 +01:00
Chris Mayo
4ffdbf2406
Replace MetaRobotsFinder using BeautifulSoup.find()
2020-04-29 20:07:00 +01:00
Chris Mayo
8fc0dcc055
Make matching login form credentials case-sensitive
...
The keys of the form.data dictionary are case-sensitive and therefore a
KeyError was possible if the configured values are not identical to
the input element name attributes.
2020-04-27 18:06:29 +01:00
Chris Mayo
7a6ef938cc
Rename htmlutil.formsearch to htmlutil.loginformsearch
...
Make it clear that this module has only one specific use.
2020-04-27 18:06:29 +01:00
anarcat
183d483074
Merge pull request #365 from cjmayo/tidyten1
...
Remove use of the future package
2020-04-26 12:02:30 -04:00
Chris Mayo
3b8af403be
Add test for loginurl
...
A new cgi-bin directory is created to identify the scripts to be run by
http.server.CGIHTTPRequestHandler.
2020-04-19 19:05:55 +01:00
Chris Mayo
56b8c9f7ab
Add tests for <meta name="robots" content="nofollow">
...
norobots.html was used for testing <meta name="robots"
content="nofollow"> in local files until [1]. This commit reinstates
local file testing and adds an http test.
Checking is reported by checker.httpurl.HttpUrl.content_allows_robots().
[1] ce733ae7 ("Don't check for robots.txt directives in local html
files.", 2014-03-19)
2020-04-18 20:30:46 +01:00
Chris Mayo
d189445a8e
LinkFinder does not raise StopParse
2020-04-18 20:30:46 +01:00
Chris Mayo
ee6628a831
Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py
...
Remove one subpackage and some import lines where htmlutil.linkparse is
also being used.
2020-04-18 20:30:45 +01:00
Chris Mayo
a83fbb56c0
Remove from __future__ imports
2020-04-15 19:49:16 +01:00
Chris Mayo
f5e7f3a382
Remove use of the future package
...
It was providing Python 2 compatibility.
2020-04-15 19:49:16 +01:00
Chris Mayo
0795e3c1b4
Replace Parser class using BeautifulSoup.find_all()
2020-04-10 13:51:09 +01:00
Chris Mayo
eb3cf28baa
Remove support for start_end_element() callback
...
The LinkFinder handler start_end_element() callback does nothing apart
from call start_element().
2020-04-10 13:51:09 +01:00
Chris Mayo
c9f17e92b9
Remove support for end_element() callback
2020-04-10 13:51:09 +01:00
Chris Mayo
48b590cf8b
Replace FormFinder using BeautifulSoup.find_all()
...
FormFinder was the only handler that used an end_element() callback and
was therefore a blocker to moving the Parser class to use
BeautifulSoup.find_all()
FormFinder was a specialised handler used to parse a login form at
the start of a session if the user had configured authentication
credentials.
2020-04-10 13:51:05 +01:00
Chris Mayo
974915cc4f
Remove encoding from Parser
...
Only used by the test and an attribute of the soup object.
2020-04-08 20:03:35 +01:00
Chris Mayo
02e1c389b2
Remove parser flush() and reset()
...
Remnants of the feed() interface.
2020-04-08 20:03:35 +01:00
Chris Mayo
3771dd9136
Use parser.feed_soup() instead of parser.feed()
...
Markup is not being passed in pieces to the parser, so simplify the
interface and reduce the state further.
2020-04-08 20:03:35 +01:00
Chris Mayo
9d8d251d06
Replace Parser lineno() and column() methods
...
Stop storing this data in Parser object state.
2020-04-08 20:03:35 +01:00
Chris Mayo
514210199d
Add tests for search_form
2020-04-07 19:24:34 +01:00
Chris Mayo
036b900ffc
Remove unused linkcheck.containers classes
2020-04-03 19:24:08 +01:00
Chris Mayo
3ff3d72492
Use BeautifulSoup element attrs directly
2020-04-03 19:24:08 +01:00
Wes Haggard
5c3978ac58
Update http test to handle new 429 behavior
2020-04-02 14:37:42 -07:00