Commit graph

312 commits

Author SHA1 Message Date
Chris Mayo
10552a79c7 Remove LinkCheckTest.fail_unicode()
No need to encode Python 3 strings before output.
2020-05-16 17:02:00 +01:00
Chris Mayo
9f95d06a39 Remove Python 2 test.test_support import 2020-05-16 16:26:38 +01:00
Chris Mayo
bda9612273 Make html.escape Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
42de609f8e Make urllib imports Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
08ddf658bc
Merge pull request #366 from cjmayo/userorpwd
Support login forms with user and/or password
2020-05-13 19:37:44 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
00c4a30386 Add user and password only loginurl tests 2020-05-13 19:32:29 +01:00
Chris Mayo
31a9f68c46
Merge pull request #367 from cjmayo/loginurl
Add test for loginurl
2020-05-12 20:08:57 +01:00
Chris Mayo
44e81d27dd Remove inheriting object
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1 Remove # -*- coding: lines
Except for tests that include non-unicode characters:

tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Chris Mayo
4d3e5abcfa Remove u string prefixes 2020-04-30 20:11:59 +01:00
Chris Mayo
1d1d9c3bde Add testing for variants of the robots meta directive 2020-04-29 20:14:10 +01:00
Chris Mayo
4ffdbf2406 Replace MetaRobotsFinder using BeautifulSoup.find() 2020-04-29 20:07:00 +01:00
Chris Mayo
3b8af403be Add test for loginurl
A new cgi-bin directory is created to identify the scripts to be run by
http.server.CGIHTTPRequestHandler.
2020-04-19 19:05:55 +01:00
Chris Mayo
56b8c9f7ab Add tests for <meta name="robots" content="nofollow">
norobots.html was used for testing <meta name="robots"
content="nofollow"> in local files until [1]. This commit reinstates
local file testing and adds an http test.

Checking is reported by checker.httpurl.HttpUrl.content_allows_robots().

[1] ce733ae7 ("Don't check for robots.txt directives in local html
files.", 2014-03-19)
2020-04-18 20:30:46 +01:00
Wes Haggard
5c3978ac58 Update http test to handle new 429 behavior 2020-04-02 14:37:42 -07:00
Chris Mayo
ecd06776ab Fix TypeError when checking https link and test
File "/usr/lib/python3.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds
    line: res = datetime.strptime(timestr, timeformat + 'Z')
    locals:
      res = <local> None
      datetime = <global> <class 'datetime.datetime'>
      datetime.strptime = <global> <built-in method strptime of type object at 0x7fa39064dda0>
      timestr = <local> b'20191106202117Z'
      timeformat = <local> '%Y%m%d%H%M%S'
TypeError: strptime() argument 1 must be str, not bytes

pyOpenSSL OpenSSL.crypto.X509.get_notAfter() returns bytes:
https://www.pyopenssl.org/en/stable/api/crypto.html#OpenSSL.crypto.X509.get_notAfter
2019-11-11 20:12:25 +00:00
Chris Mayo
dee4be4b1d Enable https checking using a test server
Verification has to be turned off because we are using a
self-signed certificate.
2019-11-11 20:12:25 +00:00
Chris Mayo
2f16152dc8 Improve test failure diff
Some url lines were missing a url prefix while others had a double url
prefix. diff was reporting more url lines as changed than actually had.
Improve formatting by removing newlines from control lines and adding
headings.

Before:

E   AssertionError: http://localhost:46031/tests/checker/data/sitemap.xml
E   ---
E
E   +++
E
E   @@ -1,4 +1,8 @@
E
E   -url http://localhost:46031/tests/checker/data/sitemap.xml
E   +http://www.example.com/
E   +cache key http://www.example.com/
E   +real url http://www.example.com/
E   +valid
E   +url url http://localhost:46031/tests/checker/data/sitemap.xml
E    cache key http://localhost:46031/tests/checker/data/sitemap.xml
E    real url http://localhost:46031/tests/checker/data/sitemap.xml
E    valid

After:

E   AssertionError: http://localhost:44021/tests/checker/data/sitemap.xml
E   --- expected
E   +++ result
E   @@ -2,3 +2,7 @@
E    cache key http://localhost:44021/tests/checker/data/sitemap.xml
E    real url http://localhost:44021/tests/checker/data/sitemap.xml
E    valid
E   +url http://www.example.com/
E   +cache key http://www.example.com/
E   +real url http://www.example.com/
E   +valid
2019-10-29 20:03:08 +00:00
Chris Mayo
ec8b6e09f0 Fix XmlTagUrlParser and make Python 3 compatible
URLs within a sitemap file were not being captured.
2019-10-28 19:20:05 +00:00
Marius Gedminas
5b2b3613ec
Merge pull request #330 from linkchecker/fix-sitemap
Fix sitemap parser
2019-10-25 16:15:55 +03:00
Marius Gedminas
606ece0308 Explain why these tests are being skipped
pytest output before this change:

    SKIPPED [3] tests/__init__.py:217: condition: True
    SKIPPED [1] tests/checker/test_news.py:63: condition: True
    SKIPPED [1] tests/checker/test_news.py:41: condition: True
    SKIPPED [1] tests/checker/test_news.py:116: condition: True
    SKIPPED [1] tests/checker/test_news.py:75: condition: True

After:

    SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up
    SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up
2019-10-23 17:35:31 +03:00
Marius Gedminas
87b504785c Add a regression test for the sitemap parser 2019-10-23 17:30:10 +03:00
Marius Gedminas
58b0d5aaae Fix TypeError: string arg required in content_allows_robots()
See #323 an #317.
2019-10-22 14:13:45 +03:00
Marius Gedminas
6a9ab5ae44 Add a failing test 2019-10-22 14:13:45 +03:00
Marius Gedminas
84dbb5d603 Fix TypeError: string arg required in find_links()
Fixes #317.
2019-10-21 17:47:46 +03:00
Marius Gedminas
a4967fe92c Add a regression test for issue #317
The important bit was making the `file_test` helper not ignore internal
errors.
2019-10-21 17:45:18 +03:00
Chris Mayo
c7a32d67fe Remove unused code from network subpackage 2019-10-19 10:27:34 +01:00
Chris Mayo
74d5c68094 Add new tests for URL quoting 2019-10-05 19:38:57 +01:00
Chris Mayo
646e138166 Pass encoding when unquoting
Else non-UTF-8 codes are misinterpreted:

>>> from urllib import parse
>>> parse.unquote("%FF")
'�'
>>> parse.unquote("%FF", "latin1")
'ÿ'
2019-10-05 19:38:57 +01:00
Chris Mayo
607328d5c5 Support Beautiful Soup line numbers 2019-10-05 19:38:57 +01:00
anarcat
bae4282c92
Merge pull request #307 from cjmayo/cgi_escape
Replace deprecated cgi.escape
2019-09-18 10:16:58 -04:00
Chris Mayo
53cd9475b5 Replace deprecated cgi.escape
html provided for Python 2 by future
https://python-future.org/compatible_idioms.html#html-escaping-and-entities
2019-09-17 20:25:05 +01:00
Petr Dlouhý
1b41df4af3 Python3: fix test error message 2019-09-17 20:20:46 +01:00
anarcat
1590408a65
Merge pull request #306 from cjmayo/python3_49
{python3_49} enable and fix remaining bookmark tests
2019-09-16 15:18:26 -04:00
anarcat
2b18ff0a5f
Merge pull request #301 from cjmayo/python3_44
{python3_44} Python3: fixes for httpserver
2019-09-16 15:16:21 -04:00
Petr Dlouhý
eaa7131523 enable and fix remaining bookmark tests
biplist module preferred for reading Safari bookmarks in
bookmarks/safari.py so install it for tox testing.
2019-09-16 20:08:01 +01:00
Petr Dlouhý
030cf8321a Python3: fixes for httpserver 2019-09-15 19:49:33 +01:00
Petr Dlouhý
a2e67af7b4 fixes for Python 3: fix telneturl 2019-09-15 19:49:18 +01:00
Petr Dlouhý
8a294be95f Python3: fix robotparser 2019-09-11 20:04:26 +01:00
Marius Gedminas
0d58a39376 Fix failing test
http://www.heise.de/ now does a redirect to HTTPS instead of denying our
crawl via robots.txt.

Fixes #269.
2019-09-04 14:04:07 +03:00
Petr Dlouhý
2c3c794e52 fix http test after parser change 2019-07-22 19:59:37 +01:00
Petr Dlouhý
d1844a526e add charset tests 2019-07-22 19:59:37 +01:00
Marius Gedminas
947b108f9e Make test_telnet.py fast
Linkchecker's telnet://username:password@host:port URL verification logic is

- connect to host:port
- wait for 'login: ' to appear (with a 10 second timeout), send username
- wait for 'Password: ' to appear (with a 10 second timeout), send password

The test spawns a fake telnet server on localhost that never presented
the login/password prompts, forcing the 10 second timeout three times.

This commit makes the fake telnet server emit the expected prompts,
making the test pass in .2 seconds.
2019-04-27 21:52:33 +03:00
Marius Gedminas
3a7c2a9823
Merge pull request #255 from linkchecker/stop-threads-more-reliably
Stop threads more reliably
2019-04-27 21:51:34 +03:00
Marius Gedminas
068e9bae8d Stop the telnet server threads more reliably
Instead of speaking text-based protocols over TCP we can use
threading.Event() objects to indicate the desire for the server thread
to quit.
2019-04-26 01:10:36 +03:00
Marius Gedminas
8489730eac Print the names of the hanging tests
In cast we forget or somebody else wants to tackle this.  After all, the
assertion error + traceback shows up at the end of the test run, and
it's not immediately clear which test is to blame for it!
2019-04-26 00:57:21 +03:00
Marius Gedminas
e285b0f257 Wow this test _is_ actually very slow!
tox -e py27 -- tests/checker/test_telnet.py takes 30 seconds to
complete.  That seems excessive to me, but one thing at a time.
2019-04-26 00:23:51 +03:00
Marius Gedminas
e9fb9b01bf Fix a hanging test on Python 3
I'm not entirely sure why the test is hanging, but this seems clear
enough:

- the test setup spawns a (non-daemon) background thread that runs
  forever, or until it is told to quit by receiving a TCP packet on a
  certain port
- the test teardown tries to tell the background thread to quit (which
  doesn't work) and waits for that to happen
- as a result the entire test run hangs forever

This commit adds a timeout as an extra safety net so that the test run
will complete even if the clean shutdown procedure fails for some
reason.
2019-04-26 00:15:10 +03:00
anarcat
59fe9ed876
Merge pull request #228 from cjmayo/python3_18
{python3_18} Python3: fix unicode in urlbase
2019-04-25 16:17:00 -04:00
anarcat
70f0bbf225
Merge pull request #250 from cjmayo/ftpserver
Get FtpServerTest working by updating to current pyftpdlib API
2019-04-25 16:16:33 -04:00
Chris Mayo
5caa683123 Make test_all_parts TestLogger import Python 3 compatible
tests/checker/test_all_parts.py:21: in <module>
    import __init__ as init
E   ModuleNotFoundError: No module named '__init__'

testWarning: cannot collect test class 'TestLogger' because it has a
__init__ constructor
2019-04-25 20:28:21 +01:00
Petr Dlouhý
b3881ce3b5 Python3: fix urlbase, strformat and others 2019-04-25 19:57:45 +01:00
anarcat
8219b976ac
Merge pull request #223 from cjmayo/python3_13
{python3_13} Python3: fix imports in test_noproxy
2019-04-24 10:56:50 -04:00
anarcat
5916206f5f
Merge pull request #220 from cjmayo/python3_10
{python3_10} Python3: fix httpserver tests
2019-04-24 10:56:17 -04:00
Chris Mayo
64e9392fb9 Get FtpServerTest working by updating to current pyftpdlib API 2019-04-22 19:34:46 +01:00
Marius Gedminas
85cee2138d Fix TestFile results not always ordered as expected values
self = <tests.checker.test_file.TestFile testMethod=test_good_dir_space>

    def test_good_dir_space (self):
...
>       self.direct(url, resultlines, recursionlevel=2)

tests/checker/test_file.py:173:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/checker/__init__.py:260: in direct
    self.fail_unicode(text(os.linesep).join(l))
tests/checker/__init__.py:237: in fail_unicode
    self.fail(msg)
E   AssertionError: Differences found testing
2019-04-16 20:25:16 +01:00
Petr Dlouhý
295555ac38 Python3: fix imports in test_noproxy 2019-04-12 20:27:09 +01:00
Petr Dlouhý
af08b4905b Python3: fix httpserver tests 2019-04-11 20:37:49 +01:00
anarcat
4b90f7b4e5
Merge pull request #225 from cjmayo/python3_15
{python3_15} fixes for Python 3: fix test_internpat and test_news
2019-04-11 11:47:21 -04:00
anarcat
6b73320cdf
Merge pull request #224 from cjmayo/python3_14
{python3_14} fixes for Python 3: fix httpserver
2019-04-11 11:46:56 -04:00
Petr Dlouhý
4211e8aecd fixes for Python 3: fix test_internpat and test_news 2019-04-09 20:09:35 +01:00
Petr Dlouhý
e8f6bc62c8 fixes for Python 3: fix httpserver 2019-04-09 20:09:35 +01:00
Petr Dlouhý
1e9fd51dfa Python3: fix permission mask in test_file 2019-04-09 20:09:35 +01:00
Christopher Baines
f24c88a073
Mark more tests that require the network
I believe all these tests require the network, at least they seem to
fail if it's I run them without connecting my computer to the web.

I'm looking at this as part of packaging linkchecker for GNU Guix,
where the package is build and the tests are run in a isolated
environment, intentionally without network access, to avoid issues
with non-reproducible package builds.
2019-01-01 22:37:21 +00:00
Antoine Beaupré
ab7502b6ff
make tests pass on IPv6 hosts
Without this patch, tests would fail on IPv6 hosts with this
mysterious error:

```
_______________________________________________________________________ TestHttpMisc.test_html ________________________________________________________________________
tests/checker/test_http_misc.py:30: in test_html
    self.obfuscate_test()
tests/checker/test_http_misc.py:51: in obfuscate_test
    url = u"http://%s/" % iputil.obfuscate_ip(ip)
linkcheck/network/iputil.py:290: in obfuscate_ip
    raise ValueError('Invalid IP value %r' % ip)
E   ValueError: Invalid IP value '2a02:2e0:3fe:1001:7777:772e:2:85'
```

As it turns out, the test host (`www.heise.de`) does have an IPv6
record and our tests pass on Travis only because they do not have a
working IPv6 stack. I happen to have IPv6 at home and tests are broken
here, so add a quick workaround so tests pass again.

Ideally, we would not have to deal with this hack and would handle
"obfuscation" correctly, but I have yet to figure out what that test
actually does before fixing it properly.
2018-04-11 19:42:30 -04:00
Petr Dlouhý
c1ab81627e test of correct logging of all parts in url_data 2018-01-14 17:17:07 +01:00
Petr Dlouhý
0a13fae3b4 remove third party packages and use them as dependency 2018-01-09 23:25:27 +01:00
Philipp Hahn
1368643a50 Fix fragment identifier quoting
According to <https://tools.ietf.org/html/rfc3986>:
 fragment    = *( pchar / "/" / "?" )
 pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
 unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
 pct-encoded = "%" HEXDIG HEXDIG
 sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Fixes #96
2017-11-10 08:03:03 -05:00
Petr Dlouhý
f5100138ff fix tests that fail because of changed linkchecker output 2017-02-14 10:59:38 +01:00
Marius Gedminas
743a5f31cb Crawl HTML attributes in deterministic order
Fixes #17.
2017-02-01 19:19:53 +02:00
Marius Gedminas
a825b9d901 Mark the non-deterministic test as xfail 2017-02-01 18:57:40 +02:00
Marius Gedminas
02869ea076 Mark TestFile.test_directory_listing as known to fail
The test unzipps a zip file with a weird-looking non-ASCII filename in it.
I don't think zip files specify the encoding for filenames.  Different
unzip utilities may interpret the filename differently.  Plus, the byte
representation of the unzipped filename may be different depending on
the filesystem charset.

To me it looks as if the filename is garbage encoded as valid UTF-8, and
the test expectation is to get it in latin-1 or something.
2017-02-01 18:45:05 +02:00
Marius Gedminas
cffea5fcbd Mark TestHttps.test_https as known to fail
This test depends on the way http://amazon.com/ works.  I don't think
that's a good idea.
2017-02-01 18:44:21 +02:00
Marius Gedminas
f4ec7531c1 Fix TestHttp.test_html
The HTML tag has two attributes with URLs:

  <applet archive="file.html" src="file.css">

It would appear that the order in which these attributes are crawled
does not match the order in the result file.

Possibly the crawling order is non-deterministic, although I cannot
reproduce that.  If that's the case, the fix would be to sort the
attributes in the crawler before following them, which means we want the
expected results sorted as well (and since 'archive' comes before 'src',
so file.html should come before file.css).
2017-02-01 18:41:47 +02:00
Bastian Kleineidam
88c060699d Fix tests 2016-01-19 22:05:15 +01:00
Bastian Kleineidam
914995b5fc Use example.com for tests. 2016-01-19 12:17:08 +01:00
Vadim Khohlov
d4352fc828 Added plugin for parsing and checking links in Markdown files 2014-11-11 15:35:18 +02:00
Bastian Kleineidam
7239cd1b76 Add test for itms-services URL. 2014-09-05 21:37:33 +02:00
Bastian Kleineidam
0fa7ed2699 Fix empty URL handling. 2014-07-03 23:34:40 +02:00
Bastian Kleineidam
cde261c009 Parse Refresh: and Content-Location: header values for URLs. 2014-07-01 20:16:43 +02:00
Bastian Kleineidam
b152ce7a6e Add PDF test and fix page number. 2014-04-29 18:53:24 +02:00
Bastian Kleineidam
7baa2f0b1b Fix http_link check and add a basic auth check. 2014-04-10 18:06:15 +02:00
Bastian Kleineidam
6caf654031 Parse Link: heaaders. 2014-04-10 17:50:55 +02:00
Bastian Kleineidam
a8623bc0bc Display SSL info on redirects. 2014-03-26 07:16:03 +01:00
Bastian Kleineidam
9cd67dfcb2 More SSL message work. 2014-03-20 20:24:57 +01:00
Bastian Kleineidam
9a7ad3a84f Print SSL cipher info for https URLs. 2014-03-19 17:02:34 +01:00
Bastian Kleineidam
ce733ae76b Don't check for robots.txt directives in local html files. 2014-03-19 16:33:22 +01:00
Bastian Kleineidam
9be667b52a Do not warn about missing addresses on mailto links that have subjects. 2014-03-18 23:27:59 +01:00
Bastian Kleineidam
fc73c6ca6e Log number of checked unique URLs. 2014-03-14 23:46:17 +01:00
Bastian Kleineidam
bca226c293 Fix assertion checking external links; fix tests 2014-03-10 18:23:44 +01:00
Bastian Kleineidam
6b334dc79b Fix URL result caching. 2014-03-08 19:35:10 +01:00
Bastian Kleineidam
fab2c2da98 Improve content type setting. 2014-03-05 20:12:19 +01:00
Bastian Kleineidam
ef13a3fce1 Implement sitemap and sitemap index parsing. 2014-03-05 19:26:37 +01:00
Bastian Kleineidam
b17211f162 Set for release. 2014-03-04 21:36:24 +01:00
Bastian Kleineidam
978b24f2d7 Merge branch 'caching' 2014-03-04 07:21:42 +01:00
Bastian Kleineidam
f1076c8813 Increase url-too-long warning. 2014-03-03 23:31:04 +01:00
Bastian Kleineidam
82f81241fd Check all links and add better caching. 2014-03-03 23:29:45 +01:00
Bastian Kleineidam
cc21f8f3d2 Add missing import. 2014-03-02 20:01:55 +01:00
Bastian Kleineidam
b8175e2357 Disable news test. 2014-03-02 20:01:36 +01:00