linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-24 06:03:43 +00:00

Author	SHA1	Message	Date
Chris Mayo	e46fb7fe9c	Supports Python 3 Only Needs miniboa >= 1.0.8 for telnet test on Python 3.7. Test with older Beautiful Soup without line number support on Python 3.5. Resolve tox deprecation warning: Matching undeclared envs is deprecated. Be sure all the envs that Tox should run are declared in the tox config.	2019-10-05 19:38:57 +01:00
Chris Mayo	4f8c2954cf	Don't set parser.encoding Read-only property with new Beautiful Soup parser.	2019-10-05 19:38:57 +01:00
Petr Dlouhý	69d426b36f	fix parser encoding tests after change of parser UnicodeDammit input has to be non-unicode to trigger character set detection.	2019-07-22 19:59:37 +01:00
Petr Dlouhý	b5111453d8	change test_parse encoding to UTF-8	2019-07-22 19:59:37 +01:00
Petr Dlouhý	2c3c794e52	fix http test after parser change	2019-07-22 19:59:37 +01:00
Petr Dlouhý	0089349760	fix parser tests after parser change	2019-07-22 19:59:37 +01:00
Petr Dlouhý	d6d48b4814	html parser: use name instead of peeking	2019-07-22 19:59:37 +01:00
Petr Dlouhý	51a06d8a1e	Remove home-cooked htmlparser and use BeautifulSoup	2019-07-22 19:59:37 +01:00
Petr Dlouhý	d1844a526e	add charset tests	2019-07-22 19:59:37 +01:00
Petr Dlouhý	2daf685633	Python3: fix few htmllib problems	2018-01-05 22:48:46 +01:00
Marius Gedminas	205ceb6805	Merge pull request #344 from hroncok/beautifulsoup4-requirement Require beautifulsoup4 instead of bs4	2020-02-06 12:52:20 +02:00
Miro Hrončok	ff5ebbae69	Require beautifulsoup4 instead of bs4 bs4 is a dummy package managed by the developer of Beautiful Soup to prevent name squatting. The official name of PyPI’s Beautiful Soup Python package is beautifulsoup4. The bs4 package ensures that if you type pip install bs4 by mistake you will end up with Beautiful Soup. However, for requirements, it's cleaner to use the proper name. For downstream packaging in Fedora, this avoids the need of packaging the dummy package.	2020-02-06 10:05:13 +01:00
anarcat	e37dab8a4b	Merge pull request #339 from cjmayo/notafter Actually fix TypeError when checking https link	2019-11-21 10:33:27 -05:00
Chris Mayo	d3d6638973	Actually fix TypeError when checking https link The test was added but not the fix in: `ecd06776` ("Fix TypeError when checking https link and test", 2019-11-11) Which is caught by the new test when run on Python 3: ___________________ TestHttps.test_x509_to_dict__________________ [gw14] linux -- Python 3.6.9 /usr/bin/python3.6 tests/checker/test_https.py:72: in test_x509_to_dict self.assertEqual(httputil.x509_to_dict(cert)["notAfter"], linkcheck/httputil.py:47: in x509_to_dict parsedtime = asn1_generaltime_to_seconds(notAfter) linkcheck/httputil.py:68: in asn1_generaltime_to_seconds res = datetime.strptime(timestr, timeformat + 'Z') E TypeError: strptime() argument 1 must be str, not bytes	2019-11-19 20:06:10 +00:00
anarcat	c92ab72676	Merge pull request #338 from cjmayo/https Enable https checking using a test server	2019-11-14 09:38:54 -05:00
Chris Mayo	ecd06776ab	Fix TypeError when checking https link and test File "/usr/lib/python3.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds line: res = datetime.strptime(timestr, timeformat + 'Z') locals: res = <local> None datetime = <global> <class 'datetime.datetime'> datetime.strptime = <global> <built-in method strptime of type object at 0x7fa39064dda0> timestr = <local> b'20191106202117Z' timeformat = <local> '%Y%m%d%H%M%S' TypeError: strptime() argument 1 must be str, not bytes pyOpenSSL OpenSSL.crypto.X509.get_notAfter() returns bytes: https://www.pyopenssl.org/en/stable/api/crypto.html#OpenSSL.crypto.X509.get_notAfter	2019-11-11 20:12:25 +00:00
Chris Mayo	dee4be4b1d	Enable https checking using a test server Verification has to be turned off because we are using a self-signed certificate.	2019-11-11 20:12:25 +00:00
anarcat	5308ec5204	Merge pull request #336 from cjmayo/logdiff Improve test failure diff	2019-10-29 16:20:26 -04:00
Chris Mayo	2f16152dc8	Improve test failure diff Some url lines were missing a url prefix while others had a double url prefix. diff was reporting more url lines as changed than actually had. Improve formatting by removing newlines from control lines and adding headings. Before: E AssertionError: http://localhost:46031/tests/checker/data/sitemap.xml E --- E E +++ E E @@ -1,4 +1,8 @@ E E -url http://localhost:46031/tests/checker/data/sitemap.xml E +http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid E +url url http://localhost:46031/tests/checker/data/sitemap.xml E cache key http://localhost:46031/tests/checker/data/sitemap.xml E real url http://localhost:46031/tests/checker/data/sitemap.xml E valid After: E AssertionError: http://localhost:44021/tests/checker/data/sitemap.xml E --- expected E +++ result E @@ -2,3 +2,7 @@ E cache key http://localhost:44021/tests/checker/data/sitemap.xml E real url http://localhost:44021/tests/checker/data/sitemap.xml E valid E +url http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid	2019-10-29 20:03:08 +00:00
Marius Gedminas	c294a4e6c1	Merge pull request #335 from cjmayo/sitemap Fix XmlTagUrlParser and make Python 3 compatible	2019-10-29 15:50:49 +02:00
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
anarcat	6dcc9dbf9d	Merge pull request #332 from cjmayo/py3pdf Make PdfParser Python 3 compatible	2019-10-25 08:38:59 -04:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Marius Gedminas	65f861901c	Fix all Python 3 tox environments Old pdfminer supports Python 2 only, new pdfminer supports Python 3 only.	2019-10-25 14:20:31 +03:00
Chris Mayo	b2e63663f8	Make PdfParser Python 3 compatible basestring is not available in Python 3. Ensure all URLs are Unicode. url_data.get_raw_content() is returning bytes.	2019-10-24 19:57:27 +01:00
Marius Gedminas	011f6c147e	Merge pull request #331 from linkchecker/explain-skips Explain why these tests are being skipped	2019-10-23 17:59:55 +03:00
Marius Gedminas	606ece0308	Explain why these tests are being skipped pytest output before this change: SKIPPED [3] tests/__init__.py:217: condition: True SKIPPED [1] tests/checker/test_news.py:63: condition: True SKIPPED [1] tests/checker/test_news.py:41: condition: True SKIPPED [1] tests/checker/test_news.py:116: condition: True SKIPPED [1] tests/checker/test_news.py:75: condition: True After: SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up	2019-10-23 17:35:31 +03:00
Marius Gedminas	87b504785c	Add a regression test for the sitemap parser	2019-10-23 17:30:10 +03:00
Marius Gedminas	a1af1e9717	Fix sitemap parser PyExpat wants bytes on Python 2. See #323.	2019-10-23 17:23:23 +03:00
Marius Gedminas	f46151dbf8	Merge pull request #318 from tkfu/docs/fix-install-instructions Add instructions to install current release tag from git via pip	2019-10-23 09:47:25 +03:00
Marius Gedminas	938467c3ae	Merge pull request #324 from cjmayo/pdfminer Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-23 09:47:01 +03:00
Marius Gedminas	db3e25e934	Merge pull request #326 from linkchecker/fix-word-maybe Fix MS Word parser, hopefully	2019-10-22 18:08:46 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	2a748a3f12	Merge pull request #328 from linkchecker/enable-clamav-tests Enable ClamAV integration tests on Travis CI	2019-10-22 18:06:24 +03:00
anarcat	928594b194	Merge pull request #316 from cjmayo/nodnspath Don't add linkcheck_dns directory to sys.path	2019-10-22 10:43:23 -04:00
Marius Gedminas	f283894f86	Sudo is needed to stop/start system services	2019-10-22 17:21:53 +03:00
Marius Gedminas	746b66e91e	Update the clamav database	2019-10-22 17:17:14 +03:00
Marius Gedminas	2251b23df5	Wait for clamav-daemon to start up	2019-10-22 17:08:25 +03:00
Marius Gedminas	7e94e542b3	Enable clamav integration tests on Travis CI	2019-10-22 17:04:09 +03:00
Marius Gedminas	fa32a89d6b	Fix MS Word parser, hopefully MS Word files are binary data, and get_temp_filename() will write them to disk using open(..., 'wb'), so we want to pass bytes in there, not Unicode. See #323.	2019-10-22 16:39:57 +03:00
Marius Gedminas	58b0d5aaae	Fix TypeError: string arg required in content_allows_robots() See #323 an #317.	2019-10-22 14:13:45 +03:00
Marius Gedminas	6a9ab5ae44	Add a failing test	2019-10-22 14:13:45 +03:00
Chris Mayo	949f84d329	PdfParser requires bytes	2019-10-21 20:12:33 +01:00
Chris Mayo	a31289c97d	Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-21 20:06:44 +01:00
Chris Mayo	7da64b16f0	Don't add linkcheck_dns directory to sys.path This code was added in: `efbbb656` ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07) Installation of linkcheck_dns stopped with: `0a13fae3` ("remove third party packages and use them as dependency", 2018-01-06)	2019-10-21 19:52:58 +01:00
Marius Gedminas	bbb90eba81	Merge pull request #321 from linkchecker/wait-for-threads-to-exit Wait for threads to exit after stopping them	2019-10-21 20:50:04 +03:00
Marius Gedminas	e274d74be2	Wait for threads to exit after stopping them This fixes a race condition where the main thread would check if any internal errors happened and get back a 0 while a worker thread was still busy printing the internal error message before incrementing the counter. Fixes #320. My experiments show that this adds no perceptible delay to the script runtime (on Linux). More specifically, there already is an annoying perceptible delay of about 1 second, but it's not caused by this change.	2019-10-21 18:23:58 +03:00
Marius Gedminas	ade5a5c399	Merge pull request #319 from linkchecker/nonascii-regression Fix TypeError: string arg required in find_links()	2019-10-21 18:02:16 +03:00

1 2 3 4 5 ...

6162 commits