linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-04-19 13:51:01 +00:00

Author	SHA1	Message	Date
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
anarcat	6dcc9dbf9d	Merge pull request #332 from cjmayo/py3pdf Make PdfParser Python 3 compatible	2019-10-25 08:38:59 -04:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Marius Gedminas	65f861901c	Fix all Python 3 tox environments Old pdfminer supports Python 2 only, new pdfminer supports Python 3 only.	2019-10-25 14:20:31 +03:00
Chris Mayo	b2e63663f8	Make PdfParser Python 3 compatible basestring is not available in Python 3. Ensure all URLs are Unicode. url_data.get_raw_content() is returning bytes.	2019-10-24 19:57:27 +01:00
Marius Gedminas	011f6c147e	Merge pull request #331 from linkchecker/explain-skips Explain why these tests are being skipped	2019-10-23 17:59:55 +03:00
Marius Gedminas	606ece0308	Explain why these tests are being skipped pytest output before this change: SKIPPED [3] tests/__init__.py:217: condition: True SKIPPED [1] tests/checker/test_news.py:63: condition: True SKIPPED [1] tests/checker/test_news.py:41: condition: True SKIPPED [1] tests/checker/test_news.py:116: condition: True SKIPPED [1] tests/checker/test_news.py:75: condition: True After: SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up	2019-10-23 17:35:31 +03:00
Marius Gedminas	87b504785c	Add a regression test for the sitemap parser	2019-10-23 17:30:10 +03:00
Marius Gedminas	a1af1e9717	Fix sitemap parser PyExpat wants bytes on Python 2. See #323.	2019-10-23 17:23:23 +03:00
Marius Gedminas	f46151dbf8	Merge pull request #318 from tkfu/docs/fix-install-instructions Add instructions to install current release tag from git via pip	2019-10-23 09:47:25 +03:00
Marius Gedminas	938467c3ae	Merge pull request #324 from cjmayo/pdfminer Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-23 09:47:01 +03:00
Marius Gedminas	db3e25e934	Merge pull request #326 from linkchecker/fix-word-maybe Fix MS Word parser, hopefully	2019-10-22 18:08:46 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	2a748a3f12	Merge pull request #328 from linkchecker/enable-clamav-tests Enable ClamAV integration tests on Travis CI	2019-10-22 18:06:24 +03:00
anarcat	928594b194	Merge pull request #316 from cjmayo/nodnspath Don't add linkcheck_dns directory to sys.path	2019-10-22 10:43:23 -04:00
Marius Gedminas	f283894f86	Sudo is needed to stop/start system services	2019-10-22 17:21:53 +03:00
Marius Gedminas	746b66e91e	Update the clamav database	2019-10-22 17:17:14 +03:00
Marius Gedminas	2251b23df5	Wait for clamav-daemon to start up	2019-10-22 17:08:25 +03:00
Marius Gedminas	7e94e542b3	Enable clamav integration tests on Travis CI	2019-10-22 17:04:09 +03:00
Marius Gedminas	fa32a89d6b	Fix MS Word parser, hopefully MS Word files are binary data, and get_temp_filename() will write them to disk using open(..., 'wb'), so we want to pass bytes in there, not Unicode. See #323.	2019-10-22 16:39:57 +03:00
Marius Gedminas	58b0d5aaae	Fix TypeError: string arg required in content_allows_robots() See #323 an #317.	2019-10-22 14:13:45 +03:00
Marius Gedminas	6a9ab5ae44	Add a failing test	2019-10-22 14:13:45 +03:00
Chris Mayo	949f84d329	PdfParser requires bytes	2019-10-21 20:12:33 +01:00
Chris Mayo	a31289c97d	Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-21 20:06:44 +01:00
Chris Mayo	7da64b16f0	Don't add linkcheck_dns directory to sys.path This code was added in: `efbbb656` ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07) Installation of linkcheck_dns stopped with: `0a13fae3` ("remove third party packages and use them as dependency", 2018-01-06)	2019-10-21 19:52:58 +01:00
Marius Gedminas	bbb90eba81	Merge pull request #321 from linkchecker/wait-for-threads-to-exit Wait for threads to exit after stopping them	2019-10-21 20:50:04 +03:00
Marius Gedminas	e274d74be2	Wait for threads to exit after stopping them This fixes a race condition where the main thread would check if any internal errors happened and get back a 0 while a worker thread was still busy printing the internal error message before incrementing the counter. Fixes #320. My experiments show that this adds no perceptible delay to the script runtime (on Linux). More specifically, there already is an annoying perceptible delay of about 1 second, but it's not caused by this change.	2019-10-21 18:23:58 +03:00
Marius Gedminas	ade5a5c399	Merge pull request #319 from linkchecker/nonascii-regression Fix TypeError: string arg required in find_links()	2019-10-21 18:02:16 +03:00
Marius Gedminas	84dbb5d603	Fix TypeError: string arg required in find_links() Fixes #317.	2019-10-21 17:47:46 +03:00
Marius Gedminas	a4967fe92c	Add a regression test for issue #317 The important bit was making the `file_test` helper not ignore internal errors.	2019-10-21 17:45:18 +03:00
Marius Gedminas	42c75b5ef9	Move some pytest options into pytest.ini This is so that I can run `tox -- -n 8` to run the tests in parallel, or `tox -- tests/checker/test_misc.py::TestMisc::test_html5` to run just a single test, without having to repeat all the other options. I haven't moved --cov=linkcheck because I don't want coverage results when I'm limiting the test run to a single test (they just make the interesting bit -- the test result itself -- scroll up). I've also added -ra to the default option list because then several tests fail, I'd like to see a list of their names in one place, not spead out between the huge tracebacks.	2019-10-21 17:42:29 +03:00
Jon Oster	2e2c81130e	Add instructions to install current release tag from git via pip Signed-off-by: Jon Oster <jon.oster@here.com>	2019-10-21 16:10:26 +02:00
anarcat	895dc016b9	Merge pull request #315 from cjmayo/lessnetwork Remove unused code from network subpackage	2019-10-20 17:06:06 -04:00
Chris Mayo	c7a32d67fe	Remove unused code from network subpackage	2019-10-19 10:27:34 +01:00
anarcat	f73ba54a2a	Merge pull request #308 from cjmayo/decode Decode content when retrieved	2019-10-10 09:46:32 -04:00
anarcat	7cfb1136e9	Merge pull request #313 from cjmayo/titlefinder Remove unused linkparse.TitleFinder	2019-10-07 11:30:10 -04:00
anarcat	5a43cfec40	Merge pull request #312 from cjmayo/notneeded Revert Python 3 patches not needed after decode	2019-10-07 11:29:52 -04:00
Chris Mayo	127c2272c4	Remove unused linkparse.TitleFinder Stopped being used with removal of UrlBase.set_title_from_content() in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2019-10-05 19:43:33 +01:00
Chris Mayo	5732606c58	Remove urlutil.decode_for_unquote() Not needed since all content is now being decoded on retrieval. Added by: `a6643034` ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)	2019-10-04 19:37:09 +01:00
Chris Mayo	2776eb5f52	Revert "Python3: fix opening file URLs" This reverts commit `4c9ec511b5`.	2019-10-04 19:37:09 +01:00
anarcat	07cf9c1c11	Merge pull request #310 from cjmayo/writeln Remove unnecessary unicode() from StatusLogger.writeln()	2019-10-01 09:36:25 -04:00
Chris Mayo	c6a06d99ac	Remove unnecessary unicode() from StatusLogger.writeln()	2019-09-30 20:06:48 +01:00
Petr Dlouhý	6e8da10942	fixes for Python 3: fix markdowncheck The translate() method of string objects (and Python 2 Unicode objects) only accepts a single, table argument.	2019-09-30 19:46:24 +01:00
Chris Mayo	e01ea0d9f0	Safari bookmark parser requires bytes	2019-09-30 19:46:24 +01:00
Chris Mayo	ad33d359c1	Adapt Opera bookmark parser to work with decoded data	2019-09-30 19:46:24 +01:00
Chris Mayo	9460064084	Use requests to decode the content of login form	2019-09-30 19:46:24 +01:00
Chris Mayo	5fc01455b7	Decode content when retrieved, use bs4 to detect encoding if non-Unicode UrlBase has been modified as follows: - the "data" variable now holds bytes - decoded content is stored in a new variable "text" - functionality from get_content() has been split out into get_raw_content() which returns "data" and download_content() which calls read_content() and sets the download related variables. This allows for subclasses to do their own decoding and parsers to use bytes.	2019-09-30 19:46:24 +01:00
Chris Mayo	0c90c718bf	Revert "Python3: fix bytes mark in parser/__init__.py" This reverts commit `aec8243348`.	2019-09-30 19:46:24 +01:00

1 2 3 4 5 ...

6142 commits