Chris Mayo
ec8b6e09f0
Fix XmlTagUrlParser and make Python 3 compatible
...
URLs within a sitemap file were not being captured.
2019-10-28 19:20:05 +00:00
Marius Gedminas
8bdd402aed
Merge pull request #333 from linkchecker/fix-clamav-on-py3
...
Fix test_clamav.py on Python 3
2019-10-25 16:16:23 +03:00
Marius Gedminas
5b2b3613ec
Merge pull request #330 from linkchecker/fix-sitemap
...
Fix sitemap parser
2019-10-25 16:15:55 +03:00
Marius Gedminas
f9766a2049
Python 3: fix bytes vs strings in viruscheck plugin
...
Socket communication deals with bytes.
There are probably remaining issues with the viruscheck plugin on
Python 3, we just can't see them because the code is not fully covered
with tests.
2019-10-25 14:24:07 +03:00
Chris Mayo
b2e63663f8
Make PdfParser Python 3 compatible
...
basestring is not available in Python 3. Ensure all URLs are Unicode.
url_data.get_raw_content() is returning bytes.
2019-10-24 19:57:27 +01:00
Marius Gedminas
a1af1e9717
Fix sitemap parser
...
PyExpat wants bytes on Python 2. See #323 .
2019-10-23 17:23:23 +03:00
Marius Gedminas
938467c3ae
Merge pull request #324 from cjmayo/pdfminer
...
Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test
2019-10-23 09:47:01 +03:00
Marius Gedminas
db3e25e934
Merge pull request #326 from linkchecker/fix-word-maybe
...
Fix MS Word parser, hopefully
2019-10-22 18:08:46 +03:00
Marius Gedminas
c6de64978c
Merge pull request #325 from linkchecker/type-error-in-robot-parser
...
Fix TypeError: string arg required in content_allows_robots()
2019-10-22 18:07:31 +03:00
Marius Gedminas
fa32a89d6b
Fix MS Word parser, hopefully
...
MS Word files are binary data, and get_temp_filename() will write them
to disk using open(..., 'wb'), so we want to pass bytes in there, not
Unicode.
See #323 .
2019-10-22 16:39:57 +03:00
Marius Gedminas
58b0d5aaae
Fix TypeError: string arg required in content_allows_robots()
...
See #323 an #317 .
2019-10-22 14:13:45 +03:00
Chris Mayo
949f84d329
PdfParser requires bytes
2019-10-21 20:12:33 +01:00
Chris Mayo
7da64b16f0
Don't add linkcheck_dns directory to sys.path
...
This code was added in:
efbbb656 ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07)
Installation of linkcheck_dns stopped with:
0a13fae3 ("remove third party packages and use them as dependency", 2018-01-06)
2019-10-21 19:52:58 +01:00
Marius Gedminas
e274d74be2
Wait for threads to exit after stopping them
...
This fixes a race condition where the main thread would check if any
internal errors happened and get back a 0 while a worker thread was
still busy printing the internal error message before incrementing the
counter.
Fixes #320 .
My experiments show that this adds no perceptible delay to the script
runtime (on Linux). More specifically, there already is an annoying
perceptible delay of about 1 second, but it's not caused by this change.
2019-10-21 18:23:58 +03:00
Marius Gedminas
84dbb5d603
Fix TypeError: string arg required in find_links()
...
Fixes #317 .
2019-10-21 17:47:46 +03:00
Chris Mayo
c7a32d67fe
Remove unused code from network subpackage
2019-10-19 10:27:34 +01:00
anarcat
f73ba54a2a
Merge pull request #308 from cjmayo/decode
...
Decode content when retrieved
2019-10-10 09:46:32 -04:00
anarcat
7cfb1136e9
Merge pull request #313 from cjmayo/titlefinder
...
Remove unused linkparse.TitleFinder
2019-10-07 11:30:10 -04:00
Chris Mayo
127c2272c4
Remove unused linkparse.TitleFinder
...
Stopped being used with removal of UrlBase.set_title_from_content() in:
7b34be59 ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)
2019-10-05 19:43:33 +01:00
Chris Mayo
5732606c58
Remove urlutil.decode_for_unquote()
...
Not needed since all content is now being decoded on retrieval.
Added by:
a6643034 ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)
2019-10-04 19:37:09 +01:00
Chris Mayo
2776eb5f52
Revert "Python3: fix opening file URLs"
...
This reverts commit 4c9ec511b5 .
2019-10-04 19:37:09 +01:00
Chris Mayo
c6a06d99ac
Remove unnecessary unicode() from StatusLogger.writeln()
2019-09-30 20:06:48 +01:00
Petr Dlouhý
6e8da10942
fixes for Python 3: fix markdowncheck
...
The translate() method of string objects (and Python 2 Unicode objects)
only accepts a single, table argument.
2019-09-30 19:46:24 +01:00
Chris Mayo
e01ea0d9f0
Safari bookmark parser requires bytes
2019-09-30 19:46:24 +01:00
Chris Mayo
ad33d359c1
Adapt Opera bookmark parser to work with decoded data
2019-09-30 19:46:24 +01:00
Chris Mayo
9460064084
Use requests to decode the content of login form
2019-09-30 19:46:24 +01:00
Chris Mayo
5fc01455b7
Decode content when retrieved, use bs4 to detect encoding if non-Unicode
...
UrlBase has been modified as follows:
- the "data" variable now holds bytes
- decoded content is stored in a new variable "text"
- functionality from get_content() has been split out into
get_raw_content() which returns "data" and download_content() which
calls read_content() and sets the download related variables.
This allows for subclasses to do their own decoding and parsers to
use bytes.
2019-09-30 19:46:24 +01:00
Chris Mayo
0c90c718bf
Revert "Python3: fix bytes mark in parser/__init__.py"
...
This reverts commit aec8243348 .
2019-09-30 19:46:24 +01:00
Chris Mayo
53cd9475b5
Replace deprecated cgi.escape
...
html provided for Python 2 by future
https://python-future.org/compatible_idioms.html#html-escaping-and-entities
2019-09-17 20:25:05 +01:00
anarcat
1590408a65
Merge pull request #306 from cjmayo/python3_49
...
{python3_49} enable and fix remaining bookmark tests
2019-09-16 15:18:26 -04:00
Petr Dlouhý
eaa7131523
enable and fix remaining bookmark tests
...
biplist module preferred for reading Safari bookmarks in
bookmarks/safari.py so install it for tox testing.
2019-09-16 20:08:01 +01:00
anarcat
4ccf0fb2d0
Merge pull request #305 from cjmayo/python3_48
...
{python3_48} Python3: fix displaying help
2019-09-16 10:10:36 -04:00
anarcat
2c7573b3b8
Merge pull request #300 from cjmayo/python3_43
...
{python3_43} Python3: fix for test_telnet in urlbase.py
2019-09-16 10:08:18 -04:00
anarcat
bec68f237b
Merge pull request #299 from cjmayo/python3_42
...
{python3_42} fixes for Python 3: fix telneturl
2019-09-16 10:07:55 -04:00
anarcat
27d672c78b
Merge pull request #297 from cjmayo/python3_40
...
{python3_40} Python3: fixes form checker/__init__.py
2019-09-16 10:06:05 -04:00
anarcat
5a0a02ae74
Merge pull request #294 from cjmayo/python3_39_alt
...
{python3_39_alt} Python3: fix TypeError in HttpUrl.read_content()
2019-09-16 10:04:23 -04:00
Petr Dlouhý
14e19efe07
Python3: fix displaying help
2019-09-15 19:50:05 +01:00
Petr Dlouhý
c2af88ad2e
Python3: fix for test_telnet in urlbase.py
2019-09-15 19:49:26 +01:00
Petr Dlouhý
a2e67af7b4
fixes for Python 3: fix telneturl
2019-09-15 19:49:18 +01:00
Petr Dlouhý
bb542b00e9
Python3: fixes form checker/__init__.py
2019-09-15 19:49:00 +01:00
Chris Mayo
06fdd78f91
Python3: fix TypeError in HttpUrl.read_content()
...
From test_http_redirect:
File "linkchecker/linkcheck/checker/httpurl.py", line 323, in read_content
line: buf.write(data)
locals:
buf = <local> <_io.StringIO object at 0x7f8fe2f45e10>
buf.write = <local> <built-in method write of _io.StringIO object at 0x7f8fe2f45e10>
data = <local> b'<a href="newurl.html">Recursive Redirect</a>\n'
TypeError: string argument expected, got 'bytes'
2019-09-15 19:42:29 +01:00
anarcat
736d2a786d
Merge pull request #293 from cjmayo/python3_37_alt
...
{python3_37_alt} Python3: fix TypeError when parsing cookie data
2019-09-14 11:51:26 -04:00
anarcat
fe39db4fbf
Merge pull request #287 from cjmayo/python3_36
...
{python3_36} fixes for Python 3 + Travis test: fix cgi
2019-09-14 11:50:53 -04:00
Chris Mayo
a7b7e31917
Python3: fix TypeError when parsing cookie data
...
> fp = BytesIO(strheader)
E TypeError: a bytes-like object is required, not 'str'
linkcheck/cookies.py:61: TypeError
The email package provides the message_from_string() convenience
function which avoids the need to create a file-like object.
Indeed http.client.HTTPMessage is implemented using email.message.Message.
2019-09-13 20:10:25 +01:00
Petr Dlouhý
36465112d0
fixes for Python 3 + Travis test: fix cgi
2019-09-13 19:46:13 +01:00
anarcat
aaa8cb675e
Merge pull request #291 from cjmayo/python3_33_alt
...
{python3_33_alt} Python3: fix opening file URLs
2019-09-13 10:31:20 -04:00
anarcat
80b62a3e21
Merge pull request #292 from cjmayo/lc_cgi_error
...
Fix errors caused by logging LCFormError exceptions
2019-09-13 09:12:05 -04:00
anarcat
b0b392f7cc
Merge pull request #282 from cjmayo/python3_31
...
{python3_31} Python3: fix strformat strline()
2019-09-13 09:11:33 -04:00
Chris Mayo
6dc25547d5
Fix errors caused by logging LCFormError exceptions
2019-09-12 20:13:08 +01:00
Chris Mayo
4c9ec511b5
Python3: fix opening file URLs
...
urllib.request.urlopen() expects a string or Request object.
2019-09-12 19:58:27 +01:00