linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-19 03:51:07 +00:00

Author	SHA1	Message	Date
Chris Mayo	2f16152dc8	Improve test failure diff Some url lines were missing a url prefix while others had a double url prefix. diff was reporting more url lines as changed than actually had. Improve formatting by removing newlines from control lines and adding headings. Before: E AssertionError: http://localhost:46031/tests/checker/data/sitemap.xml E --- E E +++ E E @@ -1,4 +1,8 @@ E E -url http://localhost:46031/tests/checker/data/sitemap.xml E +http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid E +url url http://localhost:46031/tests/checker/data/sitemap.xml E cache key http://localhost:46031/tests/checker/data/sitemap.xml E real url http://localhost:46031/tests/checker/data/sitemap.xml E valid After: E AssertionError: http://localhost:44021/tests/checker/data/sitemap.xml E --- expected E +++ result E @@ -2,3 +2,7 @@ E cache key http://localhost:44021/tests/checker/data/sitemap.xml E real url http://localhost:44021/tests/checker/data/sitemap.xml E valid E +url http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid	2019-10-29 20:03:08 +00:00
Marius Gedminas	c294a4e6c1	Merge pull request #335 from cjmayo/sitemap Fix XmlTagUrlParser and make Python 3 compatible	2019-10-29 15:50:49 +02:00
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
anarcat	6dcc9dbf9d	Merge pull request #332 from cjmayo/py3pdf Make PdfParser Python 3 compatible	2019-10-25 08:38:59 -04:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Marius Gedminas	65f861901c	Fix all Python 3 tox environments Old pdfminer supports Python 2 only, new pdfminer supports Python 3 only.	2019-10-25 14:20:31 +03:00
Chris Mayo	b2e63663f8	Make PdfParser Python 3 compatible basestring is not available in Python 3. Ensure all URLs are Unicode. url_data.get_raw_content() is returning bytes.	2019-10-24 19:57:27 +01:00
Marius Gedminas	011f6c147e	Merge pull request #331 from linkchecker/explain-skips Explain why these tests are being skipped	2019-10-23 17:59:55 +03:00
Marius Gedminas	606ece0308	Explain why these tests are being skipped pytest output before this change: SKIPPED [3] tests/__init__.py:217: condition: True SKIPPED [1] tests/checker/test_news.py:63: condition: True SKIPPED [1] tests/checker/test_news.py:41: condition: True SKIPPED [1] tests/checker/test_news.py:116: condition: True SKIPPED [1] tests/checker/test_news.py:75: condition: True After: SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up	2019-10-23 17:35:31 +03:00
Marius Gedminas	87b504785c	Add a regression test for the sitemap parser	2019-10-23 17:30:10 +03:00
Marius Gedminas	a1af1e9717	Fix sitemap parser PyExpat wants bytes on Python 2. See #323.	2019-10-23 17:23:23 +03:00
Marius Gedminas	f46151dbf8	Merge pull request #318 from tkfu/docs/fix-install-instructions Add instructions to install current release tag from git via pip	2019-10-23 09:47:25 +03:00
Marius Gedminas	938467c3ae	Merge pull request #324 from cjmayo/pdfminer Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-23 09:47:01 +03:00
Marius Gedminas	db3e25e934	Merge pull request #326 from linkchecker/fix-word-maybe Fix MS Word parser, hopefully	2019-10-22 18:08:46 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	2a748a3f12	Merge pull request #328 from linkchecker/enable-clamav-tests Enable ClamAV integration tests on Travis CI	2019-10-22 18:06:24 +03:00
anarcat	928594b194	Merge pull request #316 from cjmayo/nodnspath Don't add linkcheck_dns directory to sys.path	2019-10-22 10:43:23 -04:00
Marius Gedminas	f283894f86	Sudo is needed to stop/start system services	2019-10-22 17:21:53 +03:00
Marius Gedminas	746b66e91e	Update the clamav database	2019-10-22 17:17:14 +03:00
Marius Gedminas	2251b23df5	Wait for clamav-daemon to start up	2019-10-22 17:08:25 +03:00
Marius Gedminas	7e94e542b3	Enable clamav integration tests on Travis CI	2019-10-22 17:04:09 +03:00
Marius Gedminas	fa32a89d6b	Fix MS Word parser, hopefully MS Word files are binary data, and get_temp_filename() will write them to disk using open(..., 'wb'), so we want to pass bytes in there, not Unicode. See #323.	2019-10-22 16:39:57 +03:00
Marius Gedminas	58b0d5aaae	Fix TypeError: string arg required in content_allows_robots() See #323 an #317.	2019-10-22 14:13:45 +03:00
Marius Gedminas	6a9ab5ae44	Add a failing test	2019-10-22 14:13:45 +03:00
Chris Mayo	949f84d329	PdfParser requires bytes	2019-10-21 20:12:33 +01:00
Chris Mayo	a31289c97d	Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-21 20:06:44 +01:00
Chris Mayo	7da64b16f0	Don't add linkcheck_dns directory to sys.path This code was added in: `efbbb656` ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07) Installation of linkcheck_dns stopped with: `0a13fae3` ("remove third party packages and use them as dependency", 2018-01-06)	2019-10-21 19:52:58 +01:00
Marius Gedminas	bbb90eba81	Merge pull request #321 from linkchecker/wait-for-threads-to-exit Wait for threads to exit after stopping them	2019-10-21 20:50:04 +03:00
Marius Gedminas	e274d74be2	Wait for threads to exit after stopping them This fixes a race condition where the main thread would check if any internal errors happened and get back a 0 while a worker thread was still busy printing the internal error message before incrementing the counter. Fixes #320. My experiments show that this adds no perceptible delay to the script runtime (on Linux). More specifically, there already is an annoying perceptible delay of about 1 second, but it's not caused by this change.	2019-10-21 18:23:58 +03:00
Marius Gedminas	ade5a5c399	Merge pull request #319 from linkchecker/nonascii-regression Fix TypeError: string arg required in find_links()	2019-10-21 18:02:16 +03:00
Marius Gedminas	84dbb5d603	Fix TypeError: string arg required in find_links() Fixes #317.	2019-10-21 17:47:46 +03:00
Marius Gedminas	a4967fe92c	Add a regression test for issue #317 The important bit was making the `file_test` helper not ignore internal errors.	2019-10-21 17:45:18 +03:00
Marius Gedminas	42c75b5ef9	Move some pytest options into pytest.ini This is so that I can run `tox -- -n 8` to run the tests in parallel, or `tox -- tests/checker/test_misc.py::TestMisc::test_html5` to run just a single test, without having to repeat all the other options. I haven't moved --cov=linkcheck because I don't want coverage results when I'm limiting the test run to a single test (they just make the interesting bit -- the test result itself -- scroll up). I've also added -ra to the default option list because then several tests fail, I'd like to see a list of their names in one place, not spead out between the huge tracebacks.	2019-10-21 17:42:29 +03:00
Jon Oster	2e2c81130e	Add instructions to install current release tag from git via pip Signed-off-by: Jon Oster <jon.oster@here.com>	2019-10-21 16:10:26 +02:00
anarcat	895dc016b9	Merge pull request #315 from cjmayo/lessnetwork Remove unused code from network subpackage	2019-10-20 17:06:06 -04:00
Chris Mayo	c7a32d67fe	Remove unused code from network subpackage	2019-10-19 10:27:34 +01:00
anarcat	f73ba54a2a	Merge pull request #308 from cjmayo/decode Decode content when retrieved	2019-10-10 09:46:32 -04:00
anarcat	7cfb1136e9	Merge pull request #313 from cjmayo/titlefinder Remove unused linkparse.TitleFinder	2019-10-07 11:30:10 -04:00
anarcat	5a43cfec40	Merge pull request #312 from cjmayo/notneeded Revert Python 3 patches not needed after decode	2019-10-07 11:29:52 -04:00
Chris Mayo	127c2272c4	Remove unused linkparse.TitleFinder Stopped being used with removal of UrlBase.set_title_from_content() in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2019-10-05 19:43:33 +01:00
Chris Mayo	74d5c68094	Add new tests for URL quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	b7ec71d8cc	Always use utf-8 encoding when quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	a9f147c347	Update fileutil.pathencode() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	5bb4524a63	Update strformat.ascii_safe() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	153e53ba03	Reuse soup object used for detecting encoding in the HTML parser	2019-10-05 19:38:57 +01:00
Chris Mayo	978042a54e	Hide Beautiful Soup soupsieve warning Shown every time linkchecker is run: /usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used. 'The soupsieve package is not installed. CSS selectors cannot be used.'	2019-10-05 19:38:57 +01:00
Chris Mayo	30df69c158	Improve pretty printed comments	2019-10-05 19:38:57 +01:00

... 2 3 4 5 6 ...

6313 commits