linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-16 02:23:10 +00:00

Author	SHA1	Message	Date
Miro Hrončok	ff5ebbae69	Require beautifulsoup4 instead of bs4 bs4 is a dummy package managed by the developer of Beautiful Soup to prevent name squatting. The official name of PyPI’s Beautiful Soup Python package is beautifulsoup4. The bs4 package ensures that if you type pip install bs4 by mistake you will end up with Beautiful Soup. However, for requirements, it's cleaner to use the proper name. For downstream packaging in Fedora, this avoids the need of packaging the dummy package.	2020-02-06 10:05:13 +01:00
anarcat	e37dab8a4b	Merge pull request #339 from cjmayo/notafter Actually fix TypeError when checking https link	2019-11-21 10:33:27 -05:00
Chris Mayo	d3d6638973	Actually fix TypeError when checking https link The test was added but not the fix in: `ecd06776` ("Fix TypeError when checking https link and test", 2019-11-11) Which is caught by the new test when run on Python 3: ___________________ TestHttps.test_x509_to_dict__________________ [gw14] linux -- Python 3.6.9 /usr/bin/python3.6 tests/checker/test_https.py:72: in test_x509_to_dict self.assertEqual(httputil.x509_to_dict(cert)["notAfter"], linkcheck/httputil.py:47: in x509_to_dict parsedtime = asn1_generaltime_to_seconds(notAfter) linkcheck/httputil.py:68: in asn1_generaltime_to_seconds res = datetime.strptime(timestr, timeformat + 'Z') E TypeError: strptime() argument 1 must be str, not bytes	2019-11-19 20:06:10 +00:00
anarcat	c92ab72676	Merge pull request #338 from cjmayo/https Enable https checking using a test server	2019-11-14 09:38:54 -05:00
Chris Mayo	ecd06776ab	Fix TypeError when checking https link and test File "/usr/lib/python3.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds line: res = datetime.strptime(timestr, timeformat + 'Z') locals: res = <local> None datetime = <global> <class 'datetime.datetime'> datetime.strptime = <global> <built-in method strptime of type object at 0x7fa39064dda0> timestr = <local> b'20191106202117Z' timeformat = <local> '%Y%m%d%H%M%S' TypeError: strptime() argument 1 must be str, not bytes pyOpenSSL OpenSSL.crypto.X509.get_notAfter() returns bytes: https://www.pyopenssl.org/en/stable/api/crypto.html#OpenSSL.crypto.X509.get_notAfter	2019-11-11 20:12:25 +00:00
Chris Mayo	dee4be4b1d	Enable https checking using a test server Verification has to be turned off because we are using a self-signed certificate.	2019-11-11 20:12:25 +00:00
anarcat	5308ec5204	Merge pull request #336 from cjmayo/logdiff Improve test failure diff	2019-10-29 16:20:26 -04:00
Chris Mayo	2f16152dc8	Improve test failure diff Some url lines were missing a url prefix while others had a double url prefix. diff was reporting more url lines as changed than actually had. Improve formatting by removing newlines from control lines and adding headings. Before: E AssertionError: http://localhost:46031/tests/checker/data/sitemap.xml E --- E E +++ E E @@ -1,4 +1,8 @@ E E -url http://localhost:46031/tests/checker/data/sitemap.xml E +http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid E +url url http://localhost:46031/tests/checker/data/sitemap.xml E cache key http://localhost:46031/tests/checker/data/sitemap.xml E real url http://localhost:46031/tests/checker/data/sitemap.xml E valid After: E AssertionError: http://localhost:44021/tests/checker/data/sitemap.xml E --- expected E +++ result E @@ -2,3 +2,7 @@ E cache key http://localhost:44021/tests/checker/data/sitemap.xml E real url http://localhost:44021/tests/checker/data/sitemap.xml E valid E +url http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid	2019-10-29 20:03:08 +00:00
Marius Gedminas	c294a4e6c1	Merge pull request #335 from cjmayo/sitemap Fix XmlTagUrlParser and make Python 3 compatible	2019-10-29 15:50:49 +02:00
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
anarcat	6dcc9dbf9d	Merge pull request #332 from cjmayo/py3pdf Make PdfParser Python 3 compatible	2019-10-25 08:38:59 -04:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Marius Gedminas	65f861901c	Fix all Python 3 tox environments Old pdfminer supports Python 2 only, new pdfminer supports Python 3 only.	2019-10-25 14:20:31 +03:00
Chris Mayo	b2e63663f8	Make PdfParser Python 3 compatible basestring is not available in Python 3. Ensure all URLs are Unicode. url_data.get_raw_content() is returning bytes.	2019-10-24 19:57:27 +01:00
Marius Gedminas	011f6c147e	Merge pull request #331 from linkchecker/explain-skips Explain why these tests are being skipped	2019-10-23 17:59:55 +03:00
Marius Gedminas	606ece0308	Explain why these tests are being skipped pytest output before this change: SKIPPED [3] tests/__init__.py:217: condition: True SKIPPED [1] tests/checker/test_news.py:63: condition: True SKIPPED [1] tests/checker/test_news.py:41: condition: True SKIPPED [1] tests/checker/test_news.py:116: condition: True SKIPPED [1] tests/checker/test_news.py:75: condition: True After: SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up	2019-10-23 17:35:31 +03:00
Marius Gedminas	87b504785c	Add a regression test for the sitemap parser	2019-10-23 17:30:10 +03:00
Marius Gedminas	a1af1e9717	Fix sitemap parser PyExpat wants bytes on Python 2. See #323.	2019-10-23 17:23:23 +03:00
Marius Gedminas	f46151dbf8	Merge pull request #318 from tkfu/docs/fix-install-instructions Add instructions to install current release tag from git via pip	2019-10-23 09:47:25 +03:00
Marius Gedminas	938467c3ae	Merge pull request #324 from cjmayo/pdfminer Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-23 09:47:01 +03:00
Marius Gedminas	db3e25e934	Merge pull request #326 from linkchecker/fix-word-maybe Fix MS Word parser, hopefully	2019-10-22 18:08:46 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	2a748a3f12	Merge pull request #328 from linkchecker/enable-clamav-tests Enable ClamAV integration tests on Travis CI	2019-10-22 18:06:24 +03:00
anarcat	928594b194	Merge pull request #316 from cjmayo/nodnspath Don't add linkcheck_dns directory to sys.path	2019-10-22 10:43:23 -04:00
Marius Gedminas	f283894f86	Sudo is needed to stop/start system services	2019-10-22 17:21:53 +03:00
Marius Gedminas	746b66e91e	Update the clamav database	2019-10-22 17:17:14 +03:00
Marius Gedminas	2251b23df5	Wait for clamav-daemon to start up	2019-10-22 17:08:25 +03:00
Marius Gedminas	7e94e542b3	Enable clamav integration tests on Travis CI	2019-10-22 17:04:09 +03:00
Marius Gedminas	fa32a89d6b	Fix MS Word parser, hopefully MS Word files are binary data, and get_temp_filename() will write them to disk using open(..., 'wb'), so we want to pass bytes in there, not Unicode. See #323.	2019-10-22 16:39:57 +03:00
Marius Gedminas	58b0d5aaae	Fix TypeError: string arg required in content_allows_robots() See #323 an #317.	2019-10-22 14:13:45 +03:00
Marius Gedminas	6a9ab5ae44	Add a failing test	2019-10-22 14:13:45 +03:00
Chris Mayo	949f84d329	PdfParser requires bytes	2019-10-21 20:12:33 +01:00
Chris Mayo	a31289c97d	Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-21 20:06:44 +01:00
Chris Mayo	7da64b16f0	Don't add linkcheck_dns directory to sys.path This code was added in: `efbbb656` ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07) Installation of linkcheck_dns stopped with: `0a13fae3` ("remove third party packages and use them as dependency", 2018-01-06)	2019-10-21 19:52:58 +01:00
Marius Gedminas	bbb90eba81	Merge pull request #321 from linkchecker/wait-for-threads-to-exit Wait for threads to exit after stopping them	2019-10-21 20:50:04 +03:00
Marius Gedminas	e274d74be2	Wait for threads to exit after stopping them This fixes a race condition where the main thread would check if any internal errors happened and get back a 0 while a worker thread was still busy printing the internal error message before incrementing the counter. Fixes #320. My experiments show that this adds no perceptible delay to the script runtime (on Linux). More specifically, there already is an annoying perceptible delay of about 1 second, but it's not caused by this change.	2019-10-21 18:23:58 +03:00
Marius Gedminas	ade5a5c399	Merge pull request #319 from linkchecker/nonascii-regression Fix TypeError: string arg required in find_links()	2019-10-21 18:02:16 +03:00
Marius Gedminas	84dbb5d603	Fix TypeError: string arg required in find_links() Fixes #317.	2019-10-21 17:47:46 +03:00
Marius Gedminas	a4967fe92c	Add a regression test for issue #317 The important bit was making the `file_test` helper not ignore internal errors.	2019-10-21 17:45:18 +03:00
Marius Gedminas	42c75b5ef9	Move some pytest options into pytest.ini This is so that I can run `tox -- -n 8` to run the tests in parallel, or `tox -- tests/checker/test_misc.py::TestMisc::test_html5` to run just a single test, without having to repeat all the other options. I haven't moved --cov=linkcheck because I don't want coverage results when I'm limiting the test run to a single test (they just make the interesting bit -- the test result itself -- scroll up). I've also added -ra to the default option list because then several tests fail, I'd like to see a list of their names in one place, not spead out between the huge tracebacks.	2019-10-21 17:42:29 +03:00
Jon Oster	2e2c81130e	Add instructions to install current release tag from git via pip Signed-off-by: Jon Oster <jon.oster@here.com>	2019-10-21 16:10:26 +02:00
anarcat	895dc016b9	Merge pull request #315 from cjmayo/lessnetwork Remove unused code from network subpackage	2019-10-20 17:06:06 -04:00
Chris Mayo	c7a32d67fe	Remove unused code from network subpackage	2019-10-19 10:27:34 +01:00
anarcat	f73ba54a2a	Merge pull request #308 from cjmayo/decode Decode content when retrieved	2019-10-10 09:46:32 -04:00
anarcat	7cfb1136e9	Merge pull request #313 from cjmayo/titlefinder Remove unused linkparse.TitleFinder	2019-10-07 11:30:10 -04:00
anarcat	5a43cfec40	Merge pull request #312 from cjmayo/notneeded Revert Python 3 patches not needed after decode	2019-10-07 11:29:52 -04:00
Chris Mayo	127c2272c4	Remove unused linkparse.TitleFinder Stopped being used with removal of UrlBase.set_title_from_content() in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2019-10-05 19:43:33 +01:00
Chris Mayo	5732606c58	Remove urlutil.decode_for_unquote() Not needed since all content is now being decoded on retrieval. Added by: `a6643034` ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)	2019-10-04 19:37:09 +01:00

1 2 3 4 5 ...

6151 commits