Commit graph

634 commits

Author SHA1 Message Date
Nathan Arthur
47a83cbb27 Make timing test more tolerant
On my M1 Mac, this was taking 1.01 seconds rather than the expected 1.00
seconds. This is OK, because sleep() is not guaranteed to be precise.
2022-09-08 09:15:29 -04:00
Stefan Fisk
d2b9723612 Fix srcset parsing
Resolves #631
2022-09-07 21:24:23 +02:00
Chris Mayo
3c7fb5b571 Fix checking directory containing Unicode filenames
Non-Unicode filenames are not supported.

sys.platform has not returned "linux2" since Python 3.3.
2022-09-05 19:28:40 +01:00
Chris Mayo
1abd9ea10e Skip tests in TestFile rather than silently returning 2022-09-05 19:28:40 +01:00
Chris Mayo
d6936ceb91 Add warning url-content-type-unparseable 2022-09-02 19:29:11 +01:00
Kian-Meng Ang
a70ea9ea14 Fix typos
Found via `codespell ./linkcheck/ ./tests ./doc/man/en -L bu,noone,fo,pres,shttp`
2022-09-02 17:20:02 +08:00
Chris Mayo
53bbf28584 Re-enable ldap: test
Disabled in:
1733c6a6 ("Fix Travis CI build.", 2014-03-11)
2022-08-23 19:25:32 +01:00
Chris Mayo
34ba737710 Remove cchardet from Docker image, tox env and recommendation
Beautiful Soup 4.11 will use charset-normalizer.
2022-04-11 19:26:18 +01:00
Chris Mayo
b4c2599aef Output msgfmt error on test_pos failure 2022-01-19 19:31:01 +00:00
Chris Mayo
ae577357ed Update doc translation path in test_po.py 2022-01-19 19:31:01 +00:00
Chris Mayo
8007940770 Test more command arguments 2021-12-20 19:44:40 +00:00
Chris Mayo
5fef9a3b60 Generate linkchecker command using an entry point
drop_privileges() is only used by the linkchecker command.
Move installing SIGUSR1 handler to the linkchecker command only - fixes
intermittent test failures.
2021-12-20 19:34:58 +00:00
Chris Mayo
b0878dd7e8 Require TestFile.test_directory_listing to succeed 2021-12-15 19:36:53 +00:00
Chris Mayo
76815bcf47 Don't guess the URL for files that end in .html
Fixes:
linkchecker ftp.html
failing looking for ftp://ftp.html
2021-12-13 19:31:13 +00:00
Chris Mayo
fe5a34c68f Remove linkcheck.checker.proxysupport
Set up the requests.Session() with the complete proxy configuration
to fix a problem with using an HTTP server as an HTTPS proxy and
potential redirection issues.

Requests handles no_proxy.
2021-12-13 19:25:23 +00:00
Koen Van den Wijngaert
900586dc01
Better handling for link rel dns-prefetch and add preconnect support (#536)
preconnect is only DNS checked.

This is allowed even in the Resource Hints Editor's Draft
https://w3c.github.io/resource-hints/#preconnect
2021-12-09 19:38:30 +00:00
Chris Mayo
3b19680e97 Add guidance on character set detecting including cchardet 2021-12-06 19:24:26 +00:00
Chris Mayo
0356524369 Disable AnchorCheck plugin
Can't be relied on. Multiple reports of expected results not returned.

https://github.com/linkchecker/linkchecker/issues/542
https://github.com/linkchecker/linkchecker/issues/555
https://github.com/linkchecker/linkchecker/issues/568

Previously a fix was needed just to get the tests working:
0912e8a2c ("Don't strip the URL fragment from cache key if using AnchorCheck", 2020-07-27)

After:
eaa538c81 ("don't check one url multiple times", 2016-11-09)
2021-11-29 19:35:34 +00:00
Chris Mayo
ef60e9dcd6 Enable certificate verification during https test 2021-11-22 19:27:18 +00:00
Chris Mayo
bb4102da5a Replace deprecated ssl.wrap_socket() in tests 2021-11-22 19:27:18 +00:00
Chris Mayo
deed6ce231 Ensure chardet is installed when testing using tox
Beautiful Soup uses chardet, if installed, to detect character
encodings. This can lead to different test results based on whether
chardet is installed or not.

Requests < 2.26.0 requires chardet, but since 2.26.0 Requests requires
charset_normalizer.

Explicitly installing chardet maintains consistent test results.
2021-07-27 19:48:27 +01:00
Paul Haerle
f395c74aac
Make ResultCache max_size configurable (#544)
* Make ResultCache max_size configurable

fixes #463

* Add tests and docs.

* fix documentation...

...adapt the source, not the auto-generated man pages themselves as
requested in #544.

* fix typo.
2021-06-21 19:45:19 +01:00
Marius Gedminas
163ff725f8 Fix tests? 2021-05-19 16:36:16 +03:00
Chris Mayo
09b4da393e Initialise Configuration.status_logger
Fixes failure of the LinkChecker WSGI application which does
not call Configuration.set_status_logger().
2021-01-28 19:20:24 +00:00
Chris Mayo
7c2036b68c Drop support for Beautiful Soup < 4.8.1
The minimum version supported was already 4.8.0 because of the use
of multi_valued_attributes [1].

Test support for < 4.8.1 is the only code that needs removing [2].

[1] 3ff3d724 ("Use BeautifulSoup element attrs directly", 2020-04-03)
[2] 607328d5 ("Support Beautiful Soup line numbers", 2019-10-05)
2021-01-28 19:20:24 +00:00
Chris Mayo
e922dd0224 Stop using biplist
plistlib has supported binary files since Python 3.4.
2020-10-12 19:55:46 +01:00
Chris Mayo
e75c4b3d36 Reuse linkcheck.bookmarks.safari.has_biplist in tests 2020-09-23 19:38:17 +01:00
Chris Mayo
9891fc3f70 Python 3.9 adds suport for HTTP status code 103 EARLY_HINTS 2020-09-14 19:55:05 +01:00
Chris Mayo
f268b95cf8 biplist is not compatible with Python 3.9
File ".tox/py39/lib/python3.9/site-packages/biplist/__init__.py", line 143, in readPlist
    line: raise InvalidPlistException(e)
    locals:
      InvalidPlistException = <global> <class 'biplist.InvalidPlistException'>
      e = <not found>

InvalidPlistException: module 'plistlib' has no attribute 'Data'
2020-09-14 19:55:05 +01:00
Chris Mayo
b1faef93c3
Merge pull request #495 from cjmayo/mswindows
MS Windows Python 3.7 and MS Store compatibility
2020-09-01 19:46:44 +01:00
Chris Mayo
314ec085a3
Merge pull request #462 from cjmayo/anchor
Fix anchor checking
2020-09-01 19:39:29 +01:00
Chris Mayo
89613d56f2 Replace the use of Python internal test.support
Its use is discourged and it is not present in the MS Store version of
Python.
2020-08-29 16:57:57 +01:00
Chris Mayo
1390c9cd7e
Merge pull request #489 from cjmayo/urlsplit
Replace deprecated urllib.parse.split functions
2020-08-29 16:44:56 +01:00
Chris Mayo
47604e7d34
Merge pull request #481 from cjmayo/failures
Rename blacklist to failures
2020-08-29 16:39:24 +01:00
Chris Mayo
7dfba766a9
Merge pull request #486 from cjmayo/url
Remove unused code from url.py
2020-08-26 19:28:50 +01:00
Chris Mayo
2de25d54fd Rename blacklist to failures
Continue to support blacklist for the time being, with deprecation
warnings.
2020-08-23 17:19:26 +01:00
Chris Mayo
737c61cd67
Merge pull request #484 from cjmayo/issuetests
Tests of img srcset and invalid host name
2020-08-22 16:32:03 +01:00
Chris Mayo
f99f15c349 Add a test for UrlBase.build_url() 2020-08-22 16:28:53 +01:00
Chris Mayo
d58b3ab285 Remove unused url.url_fix_common_typos() 2020-08-18 19:57:46 +01:00
Chris Mayo
71ea78382b Remove unused url.safe_host_pattern() 2020-08-18 19:57:46 +01:00
Chris Mayo
794efd6d44 Remove unused url.is_duplicate_content_url() 2020-08-18 19:57:46 +01:00
Chris Mayo
e372657fb8 Remove unused url.get_content() 2020-08-18 19:57:46 +01:00
Chris Mayo
e4ba9c84ce Remove unused url.match_{host,url}()
Removes deprecation warnings for urllib.parse.split{host,type}() in
url_split()
2020-08-18 19:57:46 +01:00
Chris Mayo
4ad20d7f03
Merge pull request #477 from cjmayo/sitemap
Detect sitemaps that do not start with an XML declaration
2020-08-18 19:51:32 +01:00
Chris Mayo
24c2f4ac39 Add test for invalid host name in content
Tests code added in:
d5690203 ("Fix critical exception when parsing a URL with a ]", 2020-08-08)
2020-08-15 17:04:41 +01:00
Chris Mayo
88c84364b3 Add additional tests for <img srcset>
Tests code added in:
7ba40537 ("Fix critical exception if srcset value ends with a comma", 2020-08-07)
27f22ae1 ("Fix treating data: URIs in srcset values as links", 2020-08-07)
2020-08-15 17:04:41 +01:00
Chris Mayo
8c804c35a5 Detect sitemaps that do not start with an XML declaration 2020-08-11 19:35:56 +01:00
Chris Mayo
40b2ebff8f Remove defaults from lc_cgi.checklink()
Only called from application() with arguments. Causes local environment
to be embedded in documentation when using Sphinx autodoc.
2020-08-05 19:54:56 +01:00
Chris Mayo
a7eacd6200 Add a test for a page with links to anchors
Query and fragment URL parts for filesystem URLs are ignored, therefore
test over http.
2020-07-27 19:22:32 +01:00
Chris Mayo
10170b2966 Add a test for the LocationInfo plugin
Because the GeoIP database now requires registration to download the
result of the lookup using geoip-database is not going to change.
2020-07-07 17:25:28 +01:00
Chris Mayo
d91a328224 Remove strformat.unicode_safe() and strformat.url_unicode_split()
All strings support Unicode in Python 3.
2020-07-07 17:25:28 +01:00
Chris Mayo
d66e64460c Remove unused code from strformat.py 2020-06-18 19:31:00 +01:00
Chris Mayo
18d6eeae76 Ensure PO files are opened as UTF-8 in test_gtranslator() 2020-06-09 19:47:24 +01:00
Chris Mayo
74d449f8ac Test po files as strings and check po files have been found 2020-06-05 16:59:46 +01:00
Chris Mayo
4330b8a59e Replace codecs.open() with open() 2020-06-05 16:59:46 +01:00
Chris Mayo
d591fedb60 Remove unused updater code that supports linkchecker-gui
pip provides update support for linkchecker.
2020-06-05 16:05:25 +01:00
Chris Mayo
a6b1eb45b1 Convert to Python 3 super() 2020-06-03 20:06:36 +01:00
Chris Mayo
5df8aa085c Convert space-separated strings in tests/ 2020-05-29 19:40:46 +01:00
Chris Mayo
c71cfcbea4 Tidy TestClamav.testInfected() acceptable_responses 2020-05-29 19:40:46 +01:00
Chris Mayo
5ee8d8e1ea Add trailing comma to single dict list in TestLoginUrl.visit_loginurl() 2020-05-29 19:40:46 +01:00
Chris Mayo
a534be0b50 Remove unnecessary character match in regexp in TestLogger.normalize() 2020-05-29 19:40:46 +01:00
Chris Mayo
be53c4a659 Remove unnecessary commas before closing brackets in tests/ 2020-05-29 19:40:46 +01:00
Chris Mayo
87039913b2 Fix remaining flake8 violations in tests/
tests/test_clamav.py:58:89: E501 line too long (90 > 88 characters)
tests/test_containers.py:38:9: F841 local variable 'dummy' is assigned to but never used
tests/test_dummy.py:35:9: F841 local variable 'dummy' is assigned to but never used
tests/test_ftpparse.py:94:89: E501 line too long (96 > 88 characters)
tests/test_url.py:128:89: E501 line too long (130 > 88 characters)
tests/test_strformat.py:62:9: E741 ambiguous variable name 'l'
tests/test_strformat.py:136:9: E731 do not assign a lambda expression, use a def
tests/checker/ftpserver.py:94:9: E722 do not use bare 'except'
tests/checker/httpserver.py:55:39: E231 missing whitespace after ','
tests/checker/httpserver.py:224:9: E722 do not use bare 'except'
tests/checker/telnetserver.py:84:9: E722 do not use bare 'except'
tests/checker/__init__.py:71:89: E501 line too long (119 > 88 characters)
tests/checker/__init__.py:292:13: E741 ambiguous variable name 'l'
tests/checker/test_http_misc.py:30:1: W293 blank line contains whitespace
tests/checker/test_https.py:21:1: F401 'tests.need_network' imported but unused
tests/checker/test_news.py:35:1: E302 expected 2 blank lines, found 1
2020-05-28 20:29:13 +01:00
Chris Mayo
165c51aeea Run black on tests/ 2020-05-28 20:29:13 +01:00
Chris Mayo
6f126a54d2 Add coverage for parser.sitemap.parse_sitemapindex() 2020-05-27 20:02:03 +01:00
Chris Mayo
f6e182f0e4 Mark TestFile.test_html_url_quote as need_network
Else without the internet the test fails, eventually, with:

warning No MX mail host for users.sourceforge.net found
2020-05-25 19:55:28 +01:00
Chris Mayo
d3c9618b1b TestHttps.test_https doesn't need the internet now
A result of changes introduced in:

dee4be4b ("Enable https checking using a test server", 2019-11-11)
2020-05-25 19:55:28 +01:00
Chris Mayo
32689ea230 Enable as many TestHttp html tests as possible without the internet 2020-05-25 19:55:28 +01:00
Chris Mayo
313a14ff0d Remove instances of Python 2 unicode 2020-05-24 19:14:47 +01:00
Marius Gedminas
d0169c46d4
Merge pull request #348 from weshaggard/HandleRateLimiting
Turn status code 429 into warning instead of failure
2020-05-24 16:16:56 +03:00
Chris Mayo
d611564cb0 Add a test for an empty html file accessed over http 2020-05-23 20:01:24 +01:00
Marius Gedminas
f268a90cfb
Merge branch 'master' into HandleRateLimiting 2020-05-23 14:15:52 +03:00
Marius Gedminas
5bd1fb4e36 Fix internal error on empty HTML files
When BeautifulSoup finds an empty file on disk, it sets
original_encoding to None.  It doesn't matter what encoding we pick for
empty files, so let's just pick one.

I don't know if there are any circumstances where BeautifulSoup might
set the encoding to None for a non-empty file.

Closes #392.
2020-05-21 19:01:33 +03:00
Chris Mayo
96e1c00ff7 TestLogger diff output is all Unicode in Python 3 2020-05-20 19:58:44 +01:00
Chris Mayo
71eaf9a982 Remove str_text from tests/ 2020-05-19 19:56:42 +01:00
Chris Mayo
a127902607 Replace str_text in asserts 2020-05-19 19:56:42 +01:00
Chris Mayo
12fd59057e Remove duplicate tests from test_strformat.py 2020-05-17 20:10:28 +01:00
Chris Mayo
339d293326 Convert tests/test_po.py to UTF-8 2020-05-17 20:10:28 +01:00
Chris Mayo
04465530c4 Use HttpServerTest.get_url() 2020-05-17 20:10:28 +01:00
Chris Mayo
58dbe1f282 Remove unused import pytest from tests/checker/test_http.py
pytest.mark.xfail() removed in:
743a5f31 ("Crawl HTML attributes in deterministic order", 2017-02-01)
2020-05-17 20:10:28 +01:00
Chris Mayo
79eafee826 Add a test for VirusCheck 2020-05-17 19:04:49 +01:00
Chris Mayo
a15a2833ca Remove spaces after names in class method definitions
And also nested functions.

This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
1663e10fe7 Remove spaces after names in function definitions
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
fc11d08968 Remove spaces after names in class definitions 2020-05-16 20:19:42 +01:00
Chris Mayo
1416a08119 On Python 3 no need to convert os.linesep to a string 2020-05-16 17:02:01 +01:00
Chris Mayo
10552a79c7 Remove LinkCheckTest.fail_unicode()
No need to encode Python 3 strings before output.
2020-05-16 17:02:00 +01:00
Chris Mayo
9f95d06a39 Remove Python 2 test.test_support import 2020-05-16 16:26:38 +01:00
Chris Mayo
f8c9faec1b Remove Python 2 cStringIO imports 2020-05-15 19:37:04 +01:00
Chris Mayo
bda9612273 Make html.escape Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
42de609f8e Make urllib imports Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
08ddf658bc
Merge pull request #366 from cjmayo/userorpwd
Support login forms with user and/or password
2020-05-13 19:37:44 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
00c4a30386 Add user and password only loginurl tests 2020-05-13 19:32:29 +01:00
Chris Mayo
31a9f68c46
Merge pull request #367 from cjmayo/loginurl
Add test for loginurl
2020-05-12 20:08:57 +01:00
Chris Mayo
44e81d27dd Remove inheriting object
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1 Remove # -*- coding: lines
Except for tests that include non-unicode characters:

tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Chris Mayo
4d3e5abcfa Remove u string prefixes 2020-04-30 20:11:59 +01:00
anarcat
ab476fa4bf
Merge pull request #364 from cjmayo/parser5
Stop using HTML handlers and improve login form error handling
2020-04-30 09:28:48 -04:00
Chris Mayo
1d1d9c3bde Add testing for variants of the robots meta directive 2020-04-29 20:14:10 +01:00
Chris Mayo
9eed070a73 Stop using HTML handlers
LinkFinder is the only remaining HTML handler therefore no need for
htmlsoup.process_soup() as an independent function or TagFinder as a
base class.
2020-04-29 20:07:00 +01:00