Commit graph

266 commits

Author SHA1 Message Date
Chris Mayo
79eafee826 Add a test for VirusCheck 2020-05-17 19:04:49 +01:00
Chris Mayo
a15a2833ca Remove spaces after names in class method definitions
And also nested functions.

This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
1663e10fe7 Remove spaces after names in function definitions
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
fc11d08968 Remove spaces after names in class definitions 2020-05-16 20:19:42 +01:00
Chris Mayo
1416a08119 On Python 3 no need to convert os.linesep to a string 2020-05-16 17:02:01 +01:00
Chris Mayo
10552a79c7 Remove LinkCheckTest.fail_unicode()
No need to encode Python 3 strings before output.
2020-05-16 17:02:00 +01:00
Chris Mayo
9f95d06a39 Remove Python 2 test.test_support import 2020-05-16 16:26:38 +01:00
Chris Mayo
bda9612273 Make html.escape Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
42de609f8e Make urllib imports Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
08ddf658bc
Merge pull request #366 from cjmayo/userorpwd
Support login forms with user and/or password
2020-05-13 19:37:44 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
00c4a30386 Add user and password only loginurl tests 2020-05-13 19:32:29 +01:00
Chris Mayo
31a9f68c46
Merge pull request #367 from cjmayo/loginurl
Add test for loginurl
2020-05-12 20:08:57 +01:00
Chris Mayo
44e81d27dd Remove inheriting object
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1 Remove # -*- coding: lines
Except for tests that include non-unicode characters:

tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Chris Mayo
4d3e5abcfa Remove u string prefixes 2020-04-30 20:11:59 +01:00
Chris Mayo
1d1d9c3bde Add testing for variants of the robots meta directive 2020-04-29 20:14:10 +01:00
Chris Mayo
4ffdbf2406 Replace MetaRobotsFinder using BeautifulSoup.find() 2020-04-29 20:07:00 +01:00
Chris Mayo
3b8af403be Add test for loginurl
A new cgi-bin directory is created to identify the scripts to be run by
http.server.CGIHTTPRequestHandler.
2020-04-19 19:05:55 +01:00
Chris Mayo
56b8c9f7ab Add tests for <meta name="robots" content="nofollow">
norobots.html was used for testing <meta name="robots"
content="nofollow"> in local files until [1]. This commit reinstates
local file testing and adds an http test.

Checking is reported by checker.httpurl.HttpUrl.content_allows_robots().

[1] ce733ae7 ("Don't check for robots.txt directives in local html
files.", 2014-03-19)
2020-04-18 20:30:46 +01:00
Chris Mayo
74d5c68094 Add new tests for URL quoting 2019-10-05 19:38:57 +01:00
Chris Mayo
646e138166 Pass encoding when unquoting
Else non-UTF-8 codes are misinterpreted:

>>> from urllib import parse
>>> parse.unquote("%FF")
'�'
>>> parse.unquote("%FF", "latin1")
'ÿ'
2019-10-05 19:38:57 +01:00
Chris Mayo
607328d5c5 Support Beautiful Soup line numbers 2019-10-05 19:38:57 +01:00
Petr Dlouhý
2c3c794e52 fix http test after parser change 2019-07-22 19:59:37 +01:00
Petr Dlouhý
d1844a526e add charset tests 2019-07-22 19:59:37 +01:00
Chris Mayo
ecd06776ab Fix TypeError when checking https link and test
File "/usr/lib/python3.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds
    line: res = datetime.strptime(timestr, timeformat + 'Z')
    locals:
      res = <local> None
      datetime = <global> <class 'datetime.datetime'>
      datetime.strptime = <global> <built-in method strptime of type object at 0x7fa39064dda0>
      timestr = <local> b'20191106202117Z'
      timeformat = <local> '%Y%m%d%H%M%S'
TypeError: strptime() argument 1 must be str, not bytes

pyOpenSSL OpenSSL.crypto.X509.get_notAfter() returns bytes:
https://www.pyopenssl.org/en/stable/api/crypto.html#OpenSSL.crypto.X509.get_notAfter
2019-11-11 20:12:25 +00:00
Chris Mayo
dee4be4b1d Enable https checking using a test server
Verification has to be turned off because we are using a
self-signed certificate.
2019-11-11 20:12:25 +00:00
Chris Mayo
2f16152dc8 Improve test failure diff
Some url lines were missing a url prefix while others had a double url
prefix. diff was reporting more url lines as changed than actually had.
Improve formatting by removing newlines from control lines and adding
headings.

Before:

E   AssertionError: http://localhost:46031/tests/checker/data/sitemap.xml
E   ---
E
E   +++
E
E   @@ -1,4 +1,8 @@
E
E   -url http://localhost:46031/tests/checker/data/sitemap.xml
E   +http://www.example.com/
E   +cache key http://www.example.com/
E   +real url http://www.example.com/
E   +valid
E   +url url http://localhost:46031/tests/checker/data/sitemap.xml
E    cache key http://localhost:46031/tests/checker/data/sitemap.xml
E    real url http://localhost:46031/tests/checker/data/sitemap.xml
E    valid

After:

E   AssertionError: http://localhost:44021/tests/checker/data/sitemap.xml
E   --- expected
E   +++ result
E   @@ -2,3 +2,7 @@
E    cache key http://localhost:44021/tests/checker/data/sitemap.xml
E    real url http://localhost:44021/tests/checker/data/sitemap.xml
E    valid
E   +url http://www.example.com/
E   +cache key http://www.example.com/
E   +real url http://www.example.com/
E   +valid
2019-10-29 20:03:08 +00:00
Chris Mayo
ec8b6e09f0 Fix XmlTagUrlParser and make Python 3 compatible
URLs within a sitemap file were not being captured.
2019-10-28 19:20:05 +00:00
Marius Gedminas
5b2b3613ec
Merge pull request #330 from linkchecker/fix-sitemap
Fix sitemap parser
2019-10-25 16:15:55 +03:00
Marius Gedminas
606ece0308 Explain why these tests are being skipped
pytest output before this change:

    SKIPPED [3] tests/__init__.py:217: condition: True
    SKIPPED [1] tests/checker/test_news.py:63: condition: True
    SKIPPED [1] tests/checker/test_news.py:41: condition: True
    SKIPPED [1] tests/checker/test_news.py:116: condition: True
    SKIPPED [1] tests/checker/test_news.py:75: condition: True

After:

    SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up
    SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up
2019-10-23 17:35:31 +03:00
Marius Gedminas
87b504785c Add a regression test for the sitemap parser 2019-10-23 17:30:10 +03:00
Marius Gedminas
58b0d5aaae Fix TypeError: string arg required in content_allows_robots()
See #323 an #317.
2019-10-22 14:13:45 +03:00
Marius Gedminas
6a9ab5ae44 Add a failing test 2019-10-22 14:13:45 +03:00
Marius Gedminas
84dbb5d603 Fix TypeError: string arg required in find_links()
Fixes #317.
2019-10-21 17:47:46 +03:00
Marius Gedminas
a4967fe92c Add a regression test for issue #317
The important bit was making the `file_test` helper not ignore internal
errors.
2019-10-21 17:45:18 +03:00
Chris Mayo
c7a32d67fe Remove unused code from network subpackage 2019-10-19 10:27:34 +01:00
anarcat
bae4282c92
Merge pull request #307 from cjmayo/cgi_escape
Replace deprecated cgi.escape
2019-09-18 10:16:58 -04:00
Chris Mayo
53cd9475b5 Replace deprecated cgi.escape
html provided for Python 2 by future
https://python-future.org/compatible_idioms.html#html-escaping-and-entities
2019-09-17 20:25:05 +01:00
Petr Dlouhý
1b41df4af3 Python3: fix test error message 2019-09-17 20:20:46 +01:00
anarcat
1590408a65
Merge pull request #306 from cjmayo/python3_49
{python3_49} enable and fix remaining bookmark tests
2019-09-16 15:18:26 -04:00
anarcat
2b18ff0a5f
Merge pull request #301 from cjmayo/python3_44
{python3_44} Python3: fixes for httpserver
2019-09-16 15:16:21 -04:00
Petr Dlouhý
eaa7131523 enable and fix remaining bookmark tests
biplist module preferred for reading Safari bookmarks in
bookmarks/safari.py so install it for tox testing.
2019-09-16 20:08:01 +01:00
Petr Dlouhý
030cf8321a Python3: fixes for httpserver 2019-09-15 19:49:33 +01:00
Petr Dlouhý
a2e67af7b4 fixes for Python 3: fix telneturl 2019-09-15 19:49:18 +01:00
Petr Dlouhý
8a294be95f Python3: fix robotparser 2019-09-11 20:04:26 +01:00
Marius Gedminas
0d58a39376 Fix failing test
http://www.heise.de/ now does a redirect to HTTPS instead of denying our
crawl via robots.txt.

Fixes #269.
2019-09-04 14:04:07 +03:00
Marius Gedminas
947b108f9e Make test_telnet.py fast
Linkchecker's telnet://username:password@host:port URL verification logic is

- connect to host:port
- wait for 'login: ' to appear (with a 10 second timeout), send username
- wait for 'Password: ' to appear (with a 10 second timeout), send password

The test spawns a fake telnet server on localhost that never presented
the login/password prompts, forcing the 10 second timeout three times.

This commit makes the fake telnet server emit the expected prompts,
making the test pass in .2 seconds.
2019-04-27 21:52:33 +03:00
Marius Gedminas
3a7c2a9823
Merge pull request #255 from linkchecker/stop-threads-more-reliably
Stop threads more reliably
2019-04-27 21:51:34 +03:00
Marius Gedminas
068e9bae8d Stop the telnet server threads more reliably
Instead of speaking text-based protocols over TCP we can use
threading.Event() objects to indicate the desire for the server thread
to quit.
2019-04-26 01:10:36 +03:00