Chris Mayo
4ffdbf2406
Replace MetaRobotsFinder using BeautifulSoup.find()
2020-04-29 20:07:00 +01:00
Chris Mayo
56b8c9f7ab
Add tests for <meta name="robots" content="nofollow">
...
norobots.html was used for testing <meta name="robots"
content="nofollow"> in local files until [1]. This commit reinstates
local file testing and adds an http test.
Checking is reported by checker.httpurl.HttpUrl.content_allows_robots().
[1] ce733ae7 ("Don't check for robots.txt directives in local html
files.", 2014-03-19)
2020-04-18 20:30:46 +01:00
Chris Mayo
74d5c68094
Add new tests for URL quoting
2019-10-05 19:38:57 +01:00
Chris Mayo
646e138166
Pass encoding when unquoting
...
Else non-UTF-8 codes are misinterpreted:
>>> from urllib import parse
>>> parse.unquote("%FF")
'�'
>>> parse.unquote("%FF", "latin1")
'ÿ'
2019-10-05 19:38:57 +01:00
Chris Mayo
607328d5c5
Support Beautiful Soup line numbers
2019-10-05 19:38:57 +01:00
Petr Dlouhý
2c3c794e52
fix http test after parser change
2019-07-22 19:59:37 +01:00
Petr Dlouhý
d1844a526e
add charset tests
2019-07-22 19:59:37 +01:00
Chris Mayo
ecd06776ab
Fix TypeError when checking https link and test
...
File "/usr/lib/python3.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds
line: res = datetime.strptime(timestr, timeformat + 'Z')
locals:
res = <local> None
datetime = <global> <class 'datetime.datetime'>
datetime.strptime = <global> <built-in method strptime of type object at 0x7fa39064dda0>
timestr = <local> b'20191106202117Z'
timeformat = <local> '%Y%m%d%H%M%S'
TypeError: strptime() argument 1 must be str, not bytes
pyOpenSSL OpenSSL.crypto.X509.get_notAfter() returns bytes:
https://www.pyopenssl.org/en/stable/api/crypto.html#OpenSSL.crypto.X509.get_notAfter
2019-11-11 20:12:25 +00:00
Chris Mayo
dee4be4b1d
Enable https checking using a test server
...
Verification has to be turned off because we are using a
self-signed certificate.
2019-11-11 20:12:25 +00:00
Chris Mayo
2f16152dc8
Improve test failure diff
...
Some url lines were missing a url prefix while others had a double url
prefix. diff was reporting more url lines as changed than actually had.
Improve formatting by removing newlines from control lines and adding
headings.
Before:
E AssertionError: http://localhost:46031/tests/checker/data/sitemap.xml
E ---
E
E +++
E
E @@ -1,4 +1,8 @@
E
E -url http://localhost:46031/tests/checker/data/sitemap.xml
E +http://www.example.com/
E +cache key http://www.example.com/
E +real url http://www.example.com/
E +valid
E +url url http://localhost:46031/tests/checker/data/sitemap.xml
E cache key http://localhost:46031/tests/checker/data/sitemap.xml
E real url http://localhost:46031/tests/checker/data/sitemap.xml
E valid
After:
E AssertionError: http://localhost:44021/tests/checker/data/sitemap.xml
E --- expected
E +++ result
E @@ -2,3 +2,7 @@
E cache key http://localhost:44021/tests/checker/data/sitemap.xml
E real url http://localhost:44021/tests/checker/data/sitemap.xml
E valid
E +url http://www.example.com/
E +cache key http://www.example.com/
E +real url http://www.example.com/
E +valid
2019-10-29 20:03:08 +00:00
Chris Mayo
ec8b6e09f0
Fix XmlTagUrlParser and make Python 3 compatible
...
URLs within a sitemap file were not being captured.
2019-10-28 19:20:05 +00:00
Marius Gedminas
5b2b3613ec
Merge pull request #330 from linkchecker/fix-sitemap
...
Fix sitemap parser
2019-10-25 16:15:55 +03:00
Marius Gedminas
606ece0308
Explain why these tests are being skipped
...
pytest output before this change:
SKIPPED [3] tests/__init__.py:217: condition: True
SKIPPED [1] tests/checker/test_news.py:63: condition: True
SKIPPED [1] tests/checker/test_news.py:41: condition: True
SKIPPED [1] tests/checker/test_news.py:116: condition: True
SKIPPED [1] tests/checker/test_news.py:75: condition: True
After:
SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up
SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up
2019-10-23 17:35:31 +03:00
Marius Gedminas
87b504785c
Add a regression test for the sitemap parser
2019-10-23 17:30:10 +03:00
Marius Gedminas
58b0d5aaae
Fix TypeError: string arg required in content_allows_robots()
...
See #323 an #317 .
2019-10-22 14:13:45 +03:00
Marius Gedminas
6a9ab5ae44
Add a failing test
2019-10-22 14:13:45 +03:00
Marius Gedminas
84dbb5d603
Fix TypeError: string arg required in find_links()
...
Fixes #317 .
2019-10-21 17:47:46 +03:00
Marius Gedminas
a4967fe92c
Add a regression test for issue #317
...
The important bit was making the `file_test` helper not ignore internal
errors.
2019-10-21 17:45:18 +03:00
Chris Mayo
c7a32d67fe
Remove unused code from network subpackage
2019-10-19 10:27:34 +01:00
anarcat
bae4282c92
Merge pull request #307 from cjmayo/cgi_escape
...
Replace deprecated cgi.escape
2019-09-18 10:16:58 -04:00
Chris Mayo
53cd9475b5
Replace deprecated cgi.escape
...
html provided for Python 2 by future
https://python-future.org/compatible_idioms.html#html-escaping-and-entities
2019-09-17 20:25:05 +01:00
Petr Dlouhý
1b41df4af3
Python3: fix test error message
2019-09-17 20:20:46 +01:00
anarcat
1590408a65
Merge pull request #306 from cjmayo/python3_49
...
{python3_49} enable and fix remaining bookmark tests
2019-09-16 15:18:26 -04:00
anarcat
2b18ff0a5f
Merge pull request #301 from cjmayo/python3_44
...
{python3_44} Python3: fixes for httpserver
2019-09-16 15:16:21 -04:00
Petr Dlouhý
eaa7131523
enable and fix remaining bookmark tests
...
biplist module preferred for reading Safari bookmarks in
bookmarks/safari.py so install it for tox testing.
2019-09-16 20:08:01 +01:00
Petr Dlouhý
030cf8321a
Python3: fixes for httpserver
2019-09-15 19:49:33 +01:00
Petr Dlouhý
a2e67af7b4
fixes for Python 3: fix telneturl
2019-09-15 19:49:18 +01:00
Petr Dlouhý
8a294be95f
Python3: fix robotparser
2019-09-11 20:04:26 +01:00
Marius Gedminas
0d58a39376
Fix failing test
...
http://www.heise.de/ now does a redirect to HTTPS instead of denying our
crawl via robots.txt.
Fixes #269 .
2019-09-04 14:04:07 +03:00
Marius Gedminas
947b108f9e
Make test_telnet.py fast
...
Linkchecker's telnet://username:password@host:port URL verification logic is
- connect to host:port
- wait for 'login: ' to appear (with a 10 second timeout), send username
- wait for 'Password: ' to appear (with a 10 second timeout), send password
The test spawns a fake telnet server on localhost that never presented
the login/password prompts, forcing the 10 second timeout three times.
This commit makes the fake telnet server emit the expected prompts,
making the test pass in .2 seconds.
2019-04-27 21:52:33 +03:00
Marius Gedminas
3a7c2a9823
Merge pull request #255 from linkchecker/stop-threads-more-reliably
...
Stop threads more reliably
2019-04-27 21:51:34 +03:00
Marius Gedminas
068e9bae8d
Stop the telnet server threads more reliably
...
Instead of speaking text-based protocols over TCP we can use
threading.Event() objects to indicate the desire for the server thread
to quit.
2019-04-26 01:10:36 +03:00
Marius Gedminas
8489730eac
Print the names of the hanging tests
...
In cast we forget or somebody else wants to tackle this. After all, the
assertion error + traceback shows up at the end of the test run, and
it's not immediately clear which test is to blame for it!
2019-04-26 00:57:21 +03:00
Marius Gedminas
e285b0f257
Wow this test _is_ actually very slow!
...
tox -e py27 -- tests/checker/test_telnet.py takes 30 seconds to
complete. That seems excessive to me, but one thing at a time.
2019-04-26 00:23:51 +03:00
Marius Gedminas
e9fb9b01bf
Fix a hanging test on Python 3
...
I'm not entirely sure why the test is hanging, but this seems clear
enough:
- the test setup spawns a (non-daemon) background thread that runs
forever, or until it is told to quit by receiving a TCP packet on a
certain port
- the test teardown tries to tell the background thread to quit (which
doesn't work) and waits for that to happen
- as a result the entire test run hangs forever
This commit adds a timeout as an extra safety net so that the test run
will complete even if the clean shutdown procedure fails for some
reason.
2019-04-26 00:15:10 +03:00
anarcat
59fe9ed876
Merge pull request #228 from cjmayo/python3_18
...
{python3_18} Python3: fix unicode in urlbase
2019-04-25 16:17:00 -04:00
anarcat
70f0bbf225
Merge pull request #250 from cjmayo/ftpserver
...
Get FtpServerTest working by updating to current pyftpdlib API
2019-04-25 16:16:33 -04:00
Chris Mayo
5caa683123
Make test_all_parts TestLogger import Python 3 compatible
...
tests/checker/test_all_parts.py:21: in <module>
import __init__ as init
E ModuleNotFoundError: No module named '__init__'
testWarning: cannot collect test class 'TestLogger' because it has a
__init__ constructor
2019-04-25 20:28:21 +01:00
Petr Dlouhý
b3881ce3b5
Python3: fix urlbase, strformat and others
2019-04-25 19:57:45 +01:00
anarcat
8219b976ac
Merge pull request #223 from cjmayo/python3_13
...
{python3_13} Python3: fix imports in test_noproxy
2019-04-24 10:56:50 -04:00
anarcat
5916206f5f
Merge pull request #220 from cjmayo/python3_10
...
{python3_10} Python3: fix httpserver tests
2019-04-24 10:56:17 -04:00
Chris Mayo
64e9392fb9
Get FtpServerTest working by updating to current pyftpdlib API
2019-04-22 19:34:46 +01:00
Marius Gedminas
85cee2138d
Fix TestFile results not always ordered as expected values
...
self = <tests.checker.test_file.TestFile testMethod=test_good_dir_space>
def test_good_dir_space (self):
...
> self.direct(url, resultlines, recursionlevel=2)
tests/checker/test_file.py:173:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/checker/__init__.py:260: in direct
self.fail_unicode(text(os.linesep).join(l))
tests/checker/__init__.py:237: in fail_unicode
self.fail(msg)
E AssertionError: Differences found testing
2019-04-16 20:25:16 +01:00
Petr Dlouhý
295555ac38
Python3: fix imports in test_noproxy
2019-04-12 20:27:09 +01:00
Petr Dlouhý
af08b4905b
Python3: fix httpserver tests
2019-04-11 20:37:49 +01:00
anarcat
4b90f7b4e5
Merge pull request #225 from cjmayo/python3_15
...
{python3_15} fixes for Python 3: fix test_internpat and test_news
2019-04-11 11:47:21 -04:00
anarcat
6b73320cdf
Merge pull request #224 from cjmayo/python3_14
...
{python3_14} fixes for Python 3: fix httpserver
2019-04-11 11:46:56 -04:00
Petr Dlouhý
4211e8aecd
fixes for Python 3: fix test_internpat and test_news
2019-04-09 20:09:35 +01:00
Petr Dlouhý
e8f6bc62c8
fixes for Python 3: fix httpserver
2019-04-09 20:09:35 +01:00
Petr Dlouhý
1e9fd51dfa
Python3: fix permission mask in test_file
2019-04-09 20:09:35 +01:00