linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-27 07:13:59 +00:00

Author	SHA1	Message	Date
Marius Gedminas	a311ebb97e	Fix doctype tests I don't think linkchecker actually cares about the document type, so I'm not sure why we're even testing this...	2020-03-23 10:56:57 +02:00
Chris Mayo	5eaad24641	Use HTTP header encoding for decoding	2020-03-22 19:54:37 +00:00
Chris Mayo	f5ae90e824	Parser threading lock no longer required with Beautiful Soup	2020-03-22 19:54:37 +00:00
Chris Mayo	74d5c68094	Add new tests for URL quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	b7ec71d8cc	Always use utf-8 encoding when quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	a9f147c347	Update fileutil.pathencode() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	5bb4524a63	Update strformat.ascii_safe() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	153e53ba03	Reuse soup object used for detecting encoding in the HTML parser	2019-10-05 19:38:57 +01:00
Chris Mayo	978042a54e	Hide Beautiful Soup soupsieve warning Shown every time linkchecker is run: /usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used. 'The soupsieve package is not installed. CSS selectors cannot be used.'	2019-10-05 19:38:57 +01:00
Chris Mayo	30df69c158	Improve pretty printed comments	2019-10-05 19:38:57 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Chris Mayo	e46fb7fe9c	Supports Python 3 Only Needs miniboa >= 1.0.8 for telnet test on Python 3.7. Test with older Beautiful Soup without line number support on Python 3.5. Resolve tox deprecation warning: Matching undeclared envs is deprecated. Be sure all the envs that Tox should run are declared in the tox config.	2019-10-05 19:38:57 +01:00
Chris Mayo	4f8c2954cf	Don't set parser.encoding Read-only property with new Beautiful Soup parser.	2019-10-05 19:38:57 +01:00
Petr Dlouhý	69d426b36f	fix parser encoding tests after change of parser UnicodeDammit input has to be non-unicode to trigger character set detection.	2019-07-22 19:59:37 +01:00
Petr Dlouhý	b5111453d8	change test_parse encoding to UTF-8	2019-07-22 19:59:37 +01:00
Petr Dlouhý	2c3c794e52	fix http test after parser change	2019-07-22 19:59:37 +01:00
Petr Dlouhý	0089349760	fix parser tests after parser change	2019-07-22 19:59:37 +01:00
Petr Dlouhý	d6d48b4814	html parser: use name instead of peeking	2019-07-22 19:59:37 +01:00
Petr Dlouhý	51a06d8a1e	Remove home-cooked htmlparser and use BeautifulSoup	2019-07-22 19:59:37 +01:00
Petr Dlouhý	d1844a526e	add charset tests	2019-07-22 19:59:37 +01:00
Petr Dlouhý	2daf685633	Python3: fix few htmllib problems	2018-01-05 22:48:46 +01:00
Marius Gedminas	205ceb6805	Merge pull request #344 from hroncok/beautifulsoup4-requirement Require beautifulsoup4 instead of bs4	2020-02-06 12:52:20 +02:00
Miro Hrončok	ff5ebbae69	Require beautifulsoup4 instead of bs4 bs4 is a dummy package managed by the developer of Beautiful Soup to prevent name squatting. The official name of PyPI’s Beautiful Soup Python package is beautifulsoup4. The bs4 package ensures that if you type pip install bs4 by mistake you will end up with Beautiful Soup. However, for requirements, it's cleaner to use the proper name. For downstream packaging in Fedora, this avoids the need of packaging the dummy package.	2020-02-06 10:05:13 +01:00
anarcat	e37dab8a4b	Merge pull request #339 from cjmayo/notafter Actually fix TypeError when checking https link	2019-11-21 10:33:27 -05:00
Chris Mayo	d3d6638973	Actually fix TypeError when checking https link The test was added but not the fix in: `ecd06776` ("Fix TypeError when checking https link and test", 2019-11-11) Which is caught by the new test when run on Python 3: ___________________ TestHttps.test_x509_to_dict__________________ [gw14] linux -- Python 3.6.9 /usr/bin/python3.6 tests/checker/test_https.py:72: in test_x509_to_dict self.assertEqual(httputil.x509_to_dict(cert)["notAfter"], linkcheck/httputil.py:47: in x509_to_dict parsedtime = asn1_generaltime_to_seconds(notAfter) linkcheck/httputil.py:68: in asn1_generaltime_to_seconds res = datetime.strptime(timestr, timeformat + 'Z') E TypeError: strptime() argument 1 must be str, not bytes	2019-11-19 20:06:10 +00:00
anarcat	c92ab72676	Merge pull request #338 from cjmayo/https Enable https checking using a test server	2019-11-14 09:38:54 -05:00
Chris Mayo	ecd06776ab	Fix TypeError when checking https link and test File "/usr/lib/python3.7/site-packages/linkcheck/httputil.py", line 68, in asn1_generaltime_to_seconds line: res = datetime.strptime(timestr, timeformat + 'Z') locals: res = <local> None datetime = <global> <class 'datetime.datetime'> datetime.strptime = <global> <built-in method strptime of type object at 0x7fa39064dda0> timestr = <local> b'20191106202117Z' timeformat = <local> '%Y%m%d%H%M%S' TypeError: strptime() argument 1 must be str, not bytes pyOpenSSL OpenSSL.crypto.X509.get_notAfter() returns bytes: https://www.pyopenssl.org/en/stable/api/crypto.html#OpenSSL.crypto.X509.get_notAfter	2019-11-11 20:12:25 +00:00
Chris Mayo	dee4be4b1d	Enable https checking using a test server Verification has to be turned off because we are using a self-signed certificate.	2019-11-11 20:12:25 +00:00
anarcat	5308ec5204	Merge pull request #336 from cjmayo/logdiff Improve test failure diff	2019-10-29 16:20:26 -04:00
Chris Mayo	2f16152dc8	Improve test failure diff Some url lines were missing a url prefix while others had a double url prefix. diff was reporting more url lines as changed than actually had. Improve formatting by removing newlines from control lines and adding headings. Before: E AssertionError: http://localhost:46031/tests/checker/data/sitemap.xml E --- E E +++ E E @@ -1,4 +1,8 @@ E E -url http://localhost:46031/tests/checker/data/sitemap.xml E +http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid E +url url http://localhost:46031/tests/checker/data/sitemap.xml E cache key http://localhost:46031/tests/checker/data/sitemap.xml E real url http://localhost:46031/tests/checker/data/sitemap.xml E valid After: E AssertionError: http://localhost:44021/tests/checker/data/sitemap.xml E --- expected E +++ result E @@ -2,3 +2,7 @@ E cache key http://localhost:44021/tests/checker/data/sitemap.xml E real url http://localhost:44021/tests/checker/data/sitemap.xml E valid E +url http://www.example.com/ E +cache key http://www.example.com/ E +real url http://www.example.com/ E +valid	2019-10-29 20:03:08 +00:00
Marius Gedminas	c294a4e6c1	Merge pull request #335 from cjmayo/sitemap Fix XmlTagUrlParser and make Python 3 compatible	2019-10-29 15:50:49 +02:00
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
anarcat	6dcc9dbf9d	Merge pull request #332 from cjmayo/py3pdf Make PdfParser Python 3 compatible	2019-10-25 08:38:59 -04:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Marius Gedminas	65f861901c	Fix all Python 3 tox environments Old pdfminer supports Python 2 only, new pdfminer supports Python 3 only.	2019-10-25 14:20:31 +03:00
Chris Mayo	b2e63663f8	Make PdfParser Python 3 compatible basestring is not available in Python 3. Ensure all URLs are Unicode. url_data.get_raw_content() is returning bytes.	2019-10-24 19:57:27 +01:00
Marius Gedminas	011f6c147e	Merge pull request #331 from linkchecker/explain-skips Explain why these tests are being skipped	2019-10-23 17:59:55 +03:00
Marius Gedminas	606ece0308	Explain why these tests are being skipped pytest output before this change: SKIPPED [3] tests/__init__.py:217: condition: True SKIPPED [1] tests/checker/test_news.py:63: condition: True SKIPPED [1] tests/checker/test_news.py:41: condition: True SKIPPED [1] tests/checker/test_news.py:116: condition: True SKIPPED [1] tests/checker/test_news.py:75: condition: True After: SKIPPED [3] tests/__init__.py: disabled for now until some stable news server comes up SKIPPED [4] tests/checker/test_news.py: disabled for now until some stable news server comes up	2019-10-23 17:35:31 +03:00
Marius Gedminas	87b504785c	Add a regression test for the sitemap parser	2019-10-23 17:30:10 +03:00
Marius Gedminas	a1af1e9717	Fix sitemap parser PyExpat wants bytes on Python 2. See #323.	2019-10-23 17:23:23 +03:00
Marius Gedminas	f46151dbf8	Merge pull request #318 from tkfu/docs/fix-install-instructions Add instructions to install current release tag from git via pip	2019-10-23 09:47:25 +03:00
Marius Gedminas	938467c3ae	Merge pull request #324 from cjmayo/pdfminer Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-23 09:47:01 +03:00
Marius Gedminas	db3e25e934	Merge pull request #326 from linkchecker/fix-word-maybe Fix MS Word parser, hopefully	2019-10-22 18:08:46 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	2a748a3f12	Merge pull request #328 from linkchecker/enable-clamav-tests Enable ClamAV integration tests on Travis CI	2019-10-22 18:06:24 +03:00
anarcat	928594b194	Merge pull request #316 from cjmayo/nodnspath Don't add linkcheck_dns directory to sys.path	2019-10-22 10:43:23 -04:00
Marius Gedminas	f283894f86	Sudo is needed to stop/start system services	2019-10-22 17:21:53 +03:00

1 2 3 4 5 ...

6174 commits