Chris Mayo
03b1c4919d
Record encoding in debug log messages
2020-05-23 20:01:24 +01:00
Chris Mayo
f7337f55e8
Fix error due to an empty html file accessed over http
...
Use the already fixed [1] UrlBase.get_content() in HttpUrl.
[1] 5bd1fb4 ("Fix internal error on empty HTML files", 2020-05-21)
2020-05-23 20:01:24 +01:00
Marius Gedminas
f268a90cfb
Merge branch 'master' into HandleRateLimiting
2020-05-23 14:15:52 +03:00
Marius Gedminas
6dffacf17f
Merge pull request #409 from linkchecker/fix-login-timeouts
...
Make sure login form fetching uses a timeout and sends User-Agent
2020-05-22 21:40:48 +03:00
Marius Gedminas
b0435b3d47
Make sure login form fetching uses a timeout
...
Also resolve an XXX comment about the User-Agent header (which is
configured in new_request_session), but add a couple of XXX comments
about using proxy and possibly disabling TLS certificate checking.
2020-05-22 11:19:51 +03:00
Marius Gedminas
4f3fe5e1c3
Make sure fetching robots.txt uses the configured timeout
...
Closes #396 .
2020-05-22 10:53:33 +03:00
Marius Gedminas
c60d7c66e4
Clarify the decision to fall back to Latin-1
2020-05-21 19:35:39 +03:00
Marius Gedminas
5bd1fb4e36
Fix internal error on empty HTML files
...
When BeautifulSoup finds an empty file on disk, it sets
original_encoding to None. It doesn't matter what encoding we pick for
empty files, so let's just pick one.
I don't know if there are any circumstances where BeautifulSoup might
set the encoding to None for a non-empty file.
Closes #392 .
2020-05-21 19:01:33 +03:00
Chris Mayo
6cfc8eeb49
Replace threading.Thread.setName() with setting the name property
...
As recommended in:
https://docs.python.org/3.5/library/threading.html#threading.Thread.setName
2020-05-20 19:58:44 +01:00
Chris Mayo
42eba19a7d
No need to encode url in Checker.check_url_data()
...
Was causing b'' in log messages e.g. CheckThread-b'http:...
2020-05-20 19:58:44 +01:00
Chris Mayo
28f4587dfa
Remove str_text from fileutil.py, strformat.py and url.py
2020-05-19 19:56:42 +01:00
Chris Mayo
ebcc3c4961
Remove str_text from plugins/
2020-05-19 19:56:42 +01:00
Chris Mayo
1c14583535
Remove str_text from logger/
2020-05-19 19:56:42 +01:00
Chris Mayo
6bddd4ac60
Remove str_text from checker/
2020-05-19 19:56:42 +01:00
Chris Mayo
a127902607
Replace str_text in asserts
2020-05-19 19:56:42 +01:00
Chris Mayo
7490804e2c
Merge pull request #395 from cjmayo/tidyten11
...
Remove unused code from linkcheck/fileutil.py
2020-05-19 19:45:08 +01:00
Marius Gedminas
e6e969f975
Merge pull request #391 from linkchecker/dev-version
...
Bump version in git to 10.0.0.dev0
2020-05-19 18:49:34 +03:00
Chris Mayo
690605c519
Remove unused code from linkcheck/fileutil.py
2020-05-18 19:29:55 +01:00
Marius Gedminas
5317347e54
Avoid distutils.version.StrictVersion
...
distutils.version is old code that predates PEP 440. We could add a
dependency on https://packaging.pypa.io/en/latest/version/ , but meh.
2020-05-17 21:12:43 +03:00
Marius Gedminas
bb53aaa621
Fix viruscheck plugin
...
The clamav interface needs bytes, not unicode.
It would be nice if we had tests for this code.
2020-05-17 17:50:11 +01:00
Chris Mayo
a15a2833ca
Remove spaces after names in class method definitions
...
And also nested functions.
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
1663e10fe7
Remove spaces after names in function definitions
...
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
fc11d08968
Remove spaces after names in class definitions
2020-05-16 20:19:42 +01:00
Chris Mayo
1416a08119
On Python 3 no need to convert os.linesep to a string
2020-05-16 17:02:01 +01:00
Chris Mayo
0752408a44
Remove Python 2 use of sys.stdout in i18n.get_encoded_writer()
2020-05-16 17:02:00 +01:00
Chris Mayo
2c2e7e55ac
Remove CSVLogger.encode_row_s()
...
Introduced during Python 3 conversion to maintaint Python 2 support:
55a7973b ("Python3: fix csvlog", 2016-12-04)
2020-05-16 17:02:00 +01:00
Chris Mayo
ed13a926d3
Remove setting Python 2 xmlparser.returns_unicode
2020-05-16 17:02:00 +01:00
Chris Mayo
025637b08d
Remove Python 2 cookielib import
2020-05-16 16:26:38 +01:00
Chris Mayo
1e277444f4
Remove Python 2 thread import
2020-05-16 16:26:34 +01:00
Chris Mayo
dcbddfe045
Remove Python 2 ConfigParser import
2020-05-15 19:37:04 +01:00
Chris Mayo
f8c9faec1b
Remove Python 2 cStringIO imports
2020-05-15 19:37:04 +01:00
Chris Mayo
bda9612273
Make html.escape Python 3 only
2020-05-14 20:15:28 +01:00
Chris Mayo
42de609f8e
Make urllib imports Python 3 only
2020-05-14 20:15:28 +01:00
Chris Mayo
3c661a83d0
Replace parse_host_port() in checker.proxysupport with url.splitport()
2020-05-14 20:15:28 +01:00
Chris Mayo
c80002437e
Update run-time version check
2020-05-13 19:50:19 +01:00
Chris Mayo
08ddf658bc
Merge pull request #366 from cjmayo/userorpwd
...
Support login forms with user and/or password
2020-05-13 19:37:44 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
...
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
3ace021264
Support login forms with user and/or password
2020-05-13 19:32:25 +01:00
Chris Mayo
44e81d27dd
Remove inheriting object
...
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1
Remove # -*- coding: lines
...
Except for tests that include non-unicode characters:
tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Marius Gedminas
22b0165b72
Make _Logger an abstract base class
...
The __metaclass__ syntax is a Python-2-ism. It was replaced with
class _Logger (object, metaclass=abc.ABCMeta):
in Python 3. And then Python 3.4 introduced abc.ABC which is an empty
class that has ABCMeta as the metaclass, making it simpler to define
abstract base classes.
2020-04-30 23:09:42 +03:00
Chris Mayo
4d3e5abcfa
Remove u string prefixes
2020-04-30 20:11:59 +01:00
anarcat
ab476fa4bf
Merge pull request #364 from cjmayo/parser5
...
Stop using HTML handlers and improve login form error handling
2020-04-30 09:28:48 -04:00
Chris Mayo
12a948894b
Fix space style in linkcheck/htmlutil/linkparse.py
2020-04-29 20:07:00 +01:00
Chris Mayo
9eed070a73
Stop using HTML handlers
...
LinkFinder is the only remaining HTML handler therefore no need for
htmlsoup.process_soup() as an independent function or TagFinder as a
base class.
2020-04-29 20:07:00 +01:00
Chris Mayo
4ffdbf2406
Replace MetaRobotsFinder using BeautifulSoup.find()
2020-04-29 20:07:00 +01:00
Chris Mayo
a51f02cf66
Improve error handling and debugging for login form
2020-04-27 18:06:29 +01:00
Chris Mayo
9a33c2a659
Make requesting login form password work on Python 3
2020-04-27 18:06:29 +01:00
Chris Mayo
8fc0dcc055
Make matching login form credentials case-sensitive
...
The keys of the form.data dictionary are case-sensitive and therefore a
KeyError was possible if the configured values are not identical to
the input element name attributes.
2020-04-27 18:06:29 +01:00
Chris Mayo
7a6ef938cc
Rename htmlutil.formsearch to htmlutil.loginformsearch
...
Make it clear that this module has only one specific use.
2020-04-27 18:06:29 +01:00
anarcat
350f8bfef9
Merge pull request #373 from linkchecker/fix-swf-parsing
...
SWF files are binary data
2020-04-27 09:39:52 -04:00
Marius Gedminas
680783b1ff
SWF files are binary data
...
Should fix #372 .
2020-04-27 11:25:37 +03:00
anarcat
183d483074
Merge pull request #365 from cjmayo/tidyten1
...
Remove use of the future package
2020-04-26 12:02:30 -04:00
Chris Mayo
d189445a8e
LinkFinder does not raise StopParse
2020-04-18 20:30:46 +01:00
Chris Mayo
ee6628a831
Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py
...
Remove one subpackage and some import lines where htmlutil.linkparse is
also being used.
2020-04-18 20:30:45 +01:00
Chris Mayo
384e1e196d
Remove Python 2 gettext builtin installation
2020-04-15 19:49:16 +01:00
Chris Mayo
a83fbb56c0
Remove from __future__ imports
2020-04-15 19:49:16 +01:00
Chris Mayo
f5e7f3a382
Remove use of the future package
...
It was providing Python 2 compatibility.
2020-04-15 19:49:16 +01:00
Chris Mayo
0795e3c1b4
Replace Parser class using BeautifulSoup.find_all()
2020-04-10 13:51:09 +01:00
Chris Mayo
eb3cf28baa
Remove support for start_end_element() callback
...
The LinkFinder handler start_end_element() callback does nothing apart
from call start_element().
2020-04-10 13:51:09 +01:00
Chris Mayo
c9f17e92b9
Remove support for end_element() callback
2020-04-10 13:51:09 +01:00
Chris Mayo
48b590cf8b
Replace FormFinder using BeautifulSoup.find_all()
...
FormFinder was the only handler that used an end_element() callback and
was therefore a blocker to moving the Parser class to use
BeautifulSoup.find_all()
FormFinder was a specialised handler used to parse a login form at
the start of a session if the user had configured authentication
credentials.
2020-04-10 13:51:05 +01:00
Chris Mayo
974915cc4f
Remove encoding from Parser
...
Only used by the test and an attribute of the soup object.
2020-04-08 20:03:35 +01:00
Chris Mayo
02e1c389b2
Remove parser flush() and reset()
...
Remnants of the feed() interface.
2020-04-08 20:03:35 +01:00
Chris Mayo
3771dd9136
Use parser.feed_soup() instead of parser.feed()
...
Markup is not being passed in pieces to the parser, so simplify the
interface and reduce the state further.
2020-04-08 20:03:35 +01:00
Chris Mayo
40f43ae41c
Create one function to make soup objects
2020-04-08 20:03:35 +01:00
Chris Mayo
9d8d251d06
Replace Parser lineno() and column() methods
...
Stop storing this data in Parser object state.
2020-04-08 20:03:35 +01:00
Chris Mayo
16e6fb2919
Fix incorrect character in FormFinder log message
2020-04-07 19:24:34 +01:00
Chris Mayo
00f940d979
Fix FormFinder callbacks for missing element_text
...
element_text added in:
51a06d8a ("Remove home-cooked htmlparser and use BeautifulSoup",
2019-07-22)
2020-04-07 19:24:34 +01:00
Chris Mayo
fe024fb0c8
Remove unused Parser.debug() method
2020-04-03 19:24:08 +01:00
Chris Mayo
0c5e3bb403
Remove old HtmlParser .gitignore
...
htmlparse.output was a product of the built-in parser.
2020-04-03 19:24:08 +01:00
Chris Mayo
036b900ffc
Remove unused linkcheck.containers classes
2020-04-03 19:24:08 +01:00
Chris Mayo
3ff3d72492
Use BeautifulSoup element attrs directly
2020-04-03 19:24:08 +01:00
Chris Mayo
a7e1e20172
Remove last line and column from Parser
...
Only used for debug log message and not very useful.
2020-04-03 19:24:08 +01:00
Chris Mayo
28701e291a
Remove use of Python 2 unicode() and related u prefixes
...
Several instances for MS Windows left unchanged.
2020-04-01 19:39:50 +01:00
anarcat
cf4e6bb235
Merge pull request #351 from cjmayo/tagsonly
...
Remove support for non-Tag elements from Parser
2020-04-01 12:17:18 -04:00
Chris Mayo
ffa6ac457f
Remove support for non-Tag elements from Parser
...
This change is made because the linkchecker handlers only process
Tags.
The test HtmlPrettyPrinter handler is updated to output element text
because its support for non-Tag elements has been removed. This results
in a number of the existing tests still passing.
2020-03-31 20:10:35 +01:00
Chris Mayo
e7c5f353cd
Remove unused function linkcheck.fileutil.write_file()
...
Doesn't appear to have ever been used.
Causes flake8 error:
linkcheck/fileutil.py:45:9: F821 undefined name 'file'
2020-03-31 19:46:31 +01:00
Chris Mayo
504004d4f0
Use ipaddress in network.iputil.is_valid_ip()
...
ipaddress was introduced in Python 3.3.
2020-03-31 19:46:31 +01:00
Chris Mayo
2eb1424703
Replace deprecated plistlib.readPlistFromBytes() in bookmarks.safari
...
Remove Python 2 code.
plistlib.loads() was added in Python 3.4.
2020-03-31 19:46:31 +01:00
Chris Mayo
0ee4414a60
Replace memoized with functools.lru_cache
2020-03-31 19:46:31 +01:00
Chris Mayo
1255119ca8
Move HtmlPrinter and HtmlPrettyPrinter into tests
2020-03-30 19:32:30 +01:00
Chris Mayo
ce1d669329
Remove unused functions from linkcheck.httputil
...
http_persistent() unused since:
4b818cb4 ("Detect more cases to close the connection, and close response
objects", 2006-09-15)
http_keepalive(), get_content_encoding() unused since:
7b34be59 ("Introduce check plugins, use Python requests for http/s
connections, and some code cleanups and improvements.", 2014-03-01)
2020-03-30 19:32:30 +01:00
Chris Mayo
5b66964afa
Remove unused .charset from checker classes
...
Unused since:
4f8c2954 ("Don't set parser.encoding", 2019-10-05)
2020-03-30 19:32:30 +01:00
Chris Mayo
f743be57e8
Remove unused functions from linkcheck.HtmlParser
...
resolve_entities() unused since:
2c000683 ("Remove unused linkcheck.htmlutil.linkname module",
2020-03-30)
set_doctype(), set_encoding() unused since:
51a06d8a ("Remove home-cooked htmlparser and use BeautifulSoup",
2019-07-22)
2020-03-30 19:32:18 +01:00
Chris Mayo
2c000683e1
Remove unused linkcheck.htmlutil.linkname module
...
Unused since:
d6d48b48 ("html parser: use name instead of peeking", 2019-07-22)
2020-03-30 19:31:11 +01:00
Marius Gedminas
af0f50efa8
Restore support for older BeautifulSoup4 versions
2020-03-30 14:49:56 +03:00
Wes Haggard
dcdc64e878
Turn status code 429 into warning instead of failure
2020-03-25 16:36:08 -07:00
Marius Gedminas
a311ebb97e
Fix doctype tests
...
I don't think linkchecker actually cares about the document type, so I'm
not sure why we're even testing this...
2020-03-23 10:56:57 +02:00
Chris Mayo
5eaad24641
Use HTTP header encoding for decoding
2020-03-22 19:54:37 +00:00
Chris Mayo
f5ae90e824
Parser threading lock no longer required with Beautiful Soup
2020-03-22 19:54:37 +00:00
Chris Mayo
d3d6638973
Actually fix TypeError when checking https link
...
The test was added but not the fix in:
ecd06776 ("Fix TypeError when checking https link and test", 2019-11-11)
Which is caught by the new test when run on Python 3:
___________________ TestHttps.test_x509_to_dict__________________
[gw14] linux -- Python 3.6.9 /usr/bin/python3.6
tests/checker/test_https.py:72: in test_x509_to_dict
self.assertEqual(httputil.x509_to_dict(cert)["notAfter"],
linkcheck/httputil.py:47: in x509_to_dict
parsedtime = asn1_generaltime_to_seconds(notAfter)
linkcheck/httputil.py:68: in asn1_generaltime_to_seconds
res = datetime.strptime(timestr, timeformat + 'Z')
E TypeError: strptime() argument 1 must be str, not bytes
2019-11-19 20:06:10 +00:00
Chris Mayo
ec8b6e09f0
Fix XmlTagUrlParser and make Python 3 compatible
...
URLs within a sitemap file were not being captured.
2019-10-28 19:20:05 +00:00
Marius Gedminas
8bdd402aed
Merge pull request #333 from linkchecker/fix-clamav-on-py3
...
Fix test_clamav.py on Python 3
2019-10-25 16:16:23 +03:00
Marius Gedminas
5b2b3613ec
Merge pull request #330 from linkchecker/fix-sitemap
...
Fix sitemap parser
2019-10-25 16:15:55 +03:00
Marius Gedminas
f9766a2049
Python 3: fix bytes vs strings in viruscheck plugin
...
Socket communication deals with bytes.
There are probably remaining issues with the viruscheck plugin on
Python 3, we just can't see them because the code is not fully covered
with tests.
2019-10-25 14:24:07 +03:00
Chris Mayo
b2e63663f8
Make PdfParser Python 3 compatible
...
basestring is not available in Python 3. Ensure all URLs are Unicode.
url_data.get_raw_content() is returning bytes.
2019-10-24 19:57:27 +01:00
Marius Gedminas
a1af1e9717
Fix sitemap parser
...
PyExpat wants bytes on Python 2. See #323 .
2019-10-23 17:23:23 +03:00
Marius Gedminas
938467c3ae
Merge pull request #324 from cjmayo/pdfminer
...
Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test
2019-10-23 09:47:01 +03:00
Marius Gedminas
db3e25e934
Merge pull request #326 from linkchecker/fix-word-maybe
...
Fix MS Word parser, hopefully
2019-10-22 18:08:46 +03:00
Marius Gedminas
c6de64978c
Merge pull request #325 from linkchecker/type-error-in-robot-parser
...
Fix TypeError: string arg required in content_allows_robots()
2019-10-22 18:07:31 +03:00
Marius Gedminas
fa32a89d6b
Fix MS Word parser, hopefully
...
MS Word files are binary data, and get_temp_filename() will write them
to disk using open(..., 'wb'), so we want to pass bytes in there, not
Unicode.
See #323 .
2019-10-22 16:39:57 +03:00
Marius Gedminas
58b0d5aaae
Fix TypeError: string arg required in content_allows_robots()
...
See #323 an #317 .
2019-10-22 14:13:45 +03:00
Chris Mayo
949f84d329
PdfParser requires bytes
2019-10-21 20:12:33 +01:00
Chris Mayo
7da64b16f0
Don't add linkcheck_dns directory to sys.path
...
This code was added in:
efbbb656 ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07)
Installation of linkcheck_dns stopped with:
0a13fae3 ("remove third party packages and use them as dependency", 2018-01-06)
2019-10-21 19:52:58 +01:00
Marius Gedminas
e274d74be2
Wait for threads to exit after stopping them
...
This fixes a race condition where the main thread would check if any
internal errors happened and get back a 0 while a worker thread was
still busy printing the internal error message before incrementing the
counter.
Fixes #320 .
My experiments show that this adds no perceptible delay to the script
runtime (on Linux). More specifically, there already is an annoying
perceptible delay of about 1 second, but it's not caused by this change.
2019-10-21 18:23:58 +03:00
Marius Gedminas
84dbb5d603
Fix TypeError: string arg required in find_links()
...
Fixes #317 .
2019-10-21 17:47:46 +03:00
Chris Mayo
c7a32d67fe
Remove unused code from network subpackage
2019-10-19 10:27:34 +01:00
anarcat
f73ba54a2a
Merge pull request #308 from cjmayo/decode
...
Decode content when retrieved
2019-10-10 09:46:32 -04:00
anarcat
7cfb1136e9
Merge pull request #313 from cjmayo/titlefinder
...
Remove unused linkparse.TitleFinder
2019-10-07 11:30:10 -04:00
Chris Mayo
127c2272c4
Remove unused linkparse.TitleFinder
...
Stopped being used with removal of UrlBase.set_title_from_content() in:
7b34be59 ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)
2019-10-05 19:43:33 +01:00
Chris Mayo
b7ec71d8cc
Always use utf-8 encoding when quoting
2019-10-05 19:38:57 +01:00
Chris Mayo
a9f147c347
Update fileutil.pathencode() because paths are now strings
2019-10-05 19:38:57 +01:00
Chris Mayo
5bb4524a63
Update strformat.ascii_safe() because paths are now strings
2019-10-05 19:38:57 +01:00
Chris Mayo
646e138166
Pass encoding when unquoting
...
Else non-UTF-8 codes are misinterpreted:
>>> from urllib import parse
>>> parse.unquote("%FF")
'�'
>>> parse.unquote("%FF", "latin1")
'ÿ'
2019-10-05 19:38:57 +01:00
Chris Mayo
153e53ba03
Reuse soup object used for detecting encoding in the HTML parser
2019-10-05 19:38:57 +01:00
Chris Mayo
978042a54e
Hide Beautiful Soup soupsieve warning
...
Shown every time linkchecker is run:
/usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The
soupsieve package is not installed. CSS selectors cannot be used.
'The soupsieve package is not installed. CSS selectors cannot be used.'
2019-10-05 19:38:57 +01:00
Chris Mayo
30df69c158
Improve pretty printed comments
2019-10-05 19:38:57 +01:00
Chris Mayo
607328d5c5
Support Beautiful Soup line numbers
2019-10-05 19:38:57 +01:00
Chris Mayo
4f8c2954cf
Don't set parser.encoding
...
Read-only property with new Beautiful Soup parser.
2019-10-05 19:38:57 +01:00
Chris Mayo
5732606c58
Remove urlutil.decode_for_unquote()
...
Not needed since all content is now being decoded on retrieval.
Added by:
a6643034 ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)
2019-10-04 19:37:09 +01:00
Chris Mayo
2776eb5f52
Revert "Python3: fix opening file URLs"
...
This reverts commit 4c9ec511b5 .
2019-10-04 19:37:09 +01:00
Chris Mayo
c6a06d99ac
Remove unnecessary unicode() from StatusLogger.writeln()
2019-09-30 20:06:48 +01:00
Petr Dlouhý
6e8da10942
fixes for Python 3: fix markdowncheck
...
The translate() method of string objects (and Python 2 Unicode objects)
only accepts a single, table argument.
2019-09-30 19:46:24 +01:00
Chris Mayo
e01ea0d9f0
Safari bookmark parser requires bytes
2019-09-30 19:46:24 +01:00
Chris Mayo
ad33d359c1
Adapt Opera bookmark parser to work with decoded data
2019-09-30 19:46:24 +01:00
Chris Mayo
9460064084
Use requests to decode the content of login form
2019-09-30 19:46:24 +01:00
Chris Mayo
5fc01455b7
Decode content when retrieved, use bs4 to detect encoding if non-Unicode
...
UrlBase has been modified as follows:
- the "data" variable now holds bytes
- decoded content is stored in a new variable "text"
- functionality from get_content() has been split out into
get_raw_content() which returns "data" and download_content() which
calls read_content() and sets the download related variables.
This allows for subclasses to do their own decoding and parsers to
use bytes.
2019-09-30 19:46:24 +01:00
Chris Mayo
0c90c718bf
Revert "Python3: fix bytes mark in parser/__init__.py"
...
This reverts commit aec8243348 .
2019-09-30 19:46:24 +01:00
Chris Mayo
53cd9475b5
Replace deprecated cgi.escape
...
html provided for Python 2 by future
https://python-future.org/compatible_idioms.html#html-escaping-and-entities
2019-09-17 20:25:05 +01:00
anarcat
1590408a65
Merge pull request #306 from cjmayo/python3_49
...
{python3_49} enable and fix remaining bookmark tests
2019-09-16 15:18:26 -04:00
Petr Dlouhý
eaa7131523
enable and fix remaining bookmark tests
...
biplist module preferred for reading Safari bookmarks in
bookmarks/safari.py so install it for tox testing.
2019-09-16 20:08:01 +01:00
anarcat
4ccf0fb2d0
Merge pull request #305 from cjmayo/python3_48
...
{python3_48} Python3: fix displaying help
2019-09-16 10:10:36 -04:00
anarcat
2c7573b3b8
Merge pull request #300 from cjmayo/python3_43
...
{python3_43} Python3: fix for test_telnet in urlbase.py
2019-09-16 10:08:18 -04:00
anarcat
bec68f237b
Merge pull request #299 from cjmayo/python3_42
...
{python3_42} fixes for Python 3: fix telneturl
2019-09-16 10:07:55 -04:00
anarcat
27d672c78b
Merge pull request #297 from cjmayo/python3_40
...
{python3_40} Python3: fixes form checker/__init__.py
2019-09-16 10:06:05 -04:00
anarcat
5a0a02ae74
Merge pull request #294 from cjmayo/python3_39_alt
...
{python3_39_alt} Python3: fix TypeError in HttpUrl.read_content()
2019-09-16 10:04:23 -04:00
Petr Dlouhý
14e19efe07
Python3: fix displaying help
2019-09-15 19:50:05 +01:00
Petr Dlouhý
c2af88ad2e
Python3: fix for test_telnet in urlbase.py
2019-09-15 19:49:26 +01:00
Petr Dlouhý
a2e67af7b4
fixes for Python 3: fix telneturl
2019-09-15 19:49:18 +01:00
Petr Dlouhý
bb542b00e9
Python3: fixes form checker/__init__.py
2019-09-15 19:49:00 +01:00
Chris Mayo
06fdd78f91
Python3: fix TypeError in HttpUrl.read_content()
...
From test_http_redirect:
File "linkchecker/linkcheck/checker/httpurl.py", line 323, in read_content
line: buf.write(data)
locals:
buf = <local> <_io.StringIO object at 0x7f8fe2f45e10>
buf.write = <local> <built-in method write of _io.StringIO object at 0x7f8fe2f45e10>
data = <local> b'<a href="newurl.html">Recursive Redirect</a>\n'
TypeError: string argument expected, got 'bytes'
2019-09-15 19:42:29 +01:00
anarcat
736d2a786d
Merge pull request #293 from cjmayo/python3_37_alt
...
{python3_37_alt} Python3: fix TypeError when parsing cookie data
2019-09-14 11:51:26 -04:00
anarcat
fe39db4fbf
Merge pull request #287 from cjmayo/python3_36
...
{python3_36} fixes for Python 3 + Travis test: fix cgi
2019-09-14 11:50:53 -04:00
Chris Mayo
a7b7e31917
Python3: fix TypeError when parsing cookie data
...
> fp = BytesIO(strheader)
E TypeError: a bytes-like object is required, not 'str'
linkcheck/cookies.py:61: TypeError
The email package provides the message_from_string() convenience
function which avoids the need to create a file-like object.
Indeed http.client.HTTPMessage is implemented using email.message.Message.
2019-09-13 20:10:25 +01:00
Petr Dlouhý
36465112d0
fixes for Python 3 + Travis test: fix cgi
2019-09-13 19:46:13 +01:00
anarcat
aaa8cb675e
Merge pull request #291 from cjmayo/python3_33_alt
...
{python3_33_alt} Python3: fix opening file URLs
2019-09-13 10:31:20 -04:00
anarcat
80b62a3e21
Merge pull request #292 from cjmayo/lc_cgi_error
...
Fix errors caused by logging LCFormError exceptions
2019-09-13 09:12:05 -04:00
anarcat
b0b392f7cc
Merge pull request #282 from cjmayo/python3_31
...
{python3_31} Python3: fix strformat strline()
2019-09-13 09:11:33 -04:00
Chris Mayo
6dc25547d5
Fix errors caused by logging LCFormError exceptions
2019-09-12 20:13:08 +01:00
Chris Mayo
4c9ec511b5
Python3: fix opening file URLs
...
urllib.request.urlopen() expects a string or Request object.
2019-09-12 19:58:27 +01:00
anarcat
eb2e3271a2
Merge pull request #279 from cjmayo/python3_28
...
{python3_28} Python3: fix robotparser
2019-09-12 08:40:18 -04:00
anarcat
8c072fa757
Merge pull request #289 from cjmayo/python3_38
...
{python3_38} Python3: fix linkname.py
2019-09-12 08:39:29 -04:00
Petr Dlouhý
538c4cfeb9
Python3: fix linkname.py
2019-09-11 20:32:33 +01:00
Petr Dlouhý
8a294be95f
Python3: fix robotparser
2019-09-11 20:04:26 +01:00
anarcat
44944754d5
Merge pull request #286 from cjmayo/python3_35
...
{python3_35} Python3: fix unichr() in htmlparser
2019-09-11 09:48:35 -04:00
anarcat
2239458966
Merge pull request #285 from cjmayo/python3_34
...
{python3_34} fixes for Python 3: fix test_misc
2019-09-11 09:48:14 -04:00
anarcat
dbbb64cd90
Merge pull request #283 from cjmayo/python3_32
...
{python3_32} fixes for Python 3 + Travis test: fix threads
2019-09-11 09:47:44 -04:00
anarcat
492058a360
Merge pull request #281 from cjmayo/python3_30
...
{python3_30} Python3: fix decoding strings
2019-09-11 09:47:10 -04:00
anarcat
8eadc5f8a1
Merge pull request #280 from cjmayo/python3_29
...
{python3_29} fixes for Python 3: fix running problems in Python 3
2019-09-11 09:46:48 -04:00
Petr Dlouhý
f272206110
Python3: fix decoding strings
2019-09-10 19:52:23 +01:00
Petr Dlouhý
55a7973b93
Python3: fix csvlog
2019-09-10 19:42:26 +01:00
Petr Dlouhý
e10f25b968
fixes for Python 3: fix running problems in Python 3
2019-09-10 19:30:09 +01:00
Petr Dlouhý
d20ac0e108
Python3: fix strformat strline()
2019-09-09 19:51:30 +01:00
Petr Dlouhý
8b9f29ae52
Python3: fix unichr() in htmlparser
2019-09-09 19:51:30 +01:00
Petr Dlouhý
129a68da38
fixes for Python 3: fix test_misc
2019-09-09 19:51:30 +01:00
Petr Dlouhý
57f7ba0979
fixes for Python 3 + Travis test: fix threads
2019-09-09 19:51:30 +01:00
Marius Gedminas
60f9f80b9f
Fix test_console.py on Python 3
...
This is a alternative fix I suggested in the comments on PR #273 .
2019-09-09 18:52:29 +03:00
anarcat
4e6c806bff
Merge pull request #274 from cjmayo/python3_24
...
{python3_24} Python3: fix logger
2019-09-09 11:50:04 -04:00
Marius Gedminas
bb573e5eb1
Merge pull request #272 from cjmayo/python3_22
...
{python3_22} Python3: fix decode_parts function
2019-09-09 18:37:49 +03:00
anarcat
5c9376cfe2
Merge pull request #276 from cjmayo/python3_26
...
{python3_26} Python3: fix fileutil
2019-09-09 09:40:18 -04:00
Petr Dlouhý
0d7a2cac72
Python3: fix decode_parts function
2019-09-06 19:45:20 +01:00
Petr Dlouhý
9156576778
Python3: fix logger
2019-09-06 19:41:37 +01:00
Petr Dlouhý
ffb0a68ff7
Python3: fix fileurl
2019-09-05 19:41:53 +01:00
anarcat
59ab0644fd
Merge pull request #230 from cjmayo/python3_20
...
{python3_20} Python3: decode parts before submitting them to urllib.quote()
2019-09-04 09:48:19 -04:00
Petr Dlouhý
b5111453d8
change test_parse encoding to UTF-8
2019-07-22 19:59:37 +01:00
Petr Dlouhý
d6d48b4814
html parser: use name instead of peeking
2019-07-22 19:59:37 +01:00
Petr Dlouhý
51a06d8a1e
Remove home-cooked htmlparser and use BeautifulSoup
2019-07-22 19:59:37 +01:00
Nick Muerdter
fb3f65cdcc
Fix CSV output containing increasing number of null byte characters.
...
The CSV buffer is being truncated on each new row, but since the
stream's pointer isn't also being reset, each new row starts at the same
position as the previous row, but with null bytes up until that point.
This leads to increasing growth in the length of each CSV row, since
each line will be padded with null bytes equivalent to the previous
row's length.
2019-05-31 18:52:57 -06:00
Petr Dlouhý
a6643034fb
Python3: decode parts before submitting them to urllib.quote()
2019-05-10 20:06:01 +01:00
Chris Mayo
1c2e6c465e
squash! Python3: fix strformat ascii_safe() and unicode_safe()
2019-05-10 08:58:52 -04:00
Petr Dlouhý
ac14585a78
Python3: fix strformat for test_file
2019-05-10 08:58:52 -04:00
Petr Dlouhý
acaf8e671e
Python3: fix strformat unicode_safe()
2019-05-10 08:58:52 -04:00
Petr Dlouhý
e11ba8e427
squash! Python3: fix strformat ascii_safe() and unicode_safe()
...
From:
fixes for Python 3: fix running problems in Python 3
2019-05-10 08:58:52 -04:00
Petr Dlouhý
a1c6c4935e
Python3: fix strformat ascii_safe() and unicode_safe()
2019-05-10 08:58:52 -04:00
anarcat
9c9706a07a
Merge pull request #256 from cjmayo/parse_qs
...
Replace deprecated cgi.parse_qs
2019-04-27 13:27:19 -04:00
Chris Mayo
a355476b82
Replace deprecated regexp flags not at start
...
DeprecationWarning: Flags not at the start of the expression
2019-04-26 19:25:59 +01:00
Chris Mayo
5ae40c1ae2
Replace deprecated cgi.parse_qs
2019-04-26 19:23:45 +01:00
anarcat
59fe9ed876
Merge pull request #228 from cjmayo/python3_18
...
{python3_18} Python3: fix unicode in urlbase
2019-04-25 16:17:00 -04:00
anarcat
70f0bbf225
Merge pull request #250 from cjmayo/ftpserver
...
Get FtpServerTest working by updating to current pyftpdlib API
2019-04-25 16:16:33 -04:00
Petr Dlouhý
e92b0a9f7b
Python3: fix unicode in urlbase
2019-04-25 19:57:45 +01:00
Petr Dlouhý
b3881ce3b5
Python3: fix urlbase, strformat and others
2019-04-25 19:57:45 +01:00
anarcat
056ba1d717
Merge pull request #248 from cjmayo/donateurl
...
Remove configuration.DonateUrl
2019-04-24 10:59:50 -04:00
anarcat
b656346352
Merge pull request #246 from cjmayo/locale_format
...
Replace deprecated locale.format()
2019-04-24 10:59:17 -04:00
anarcat
a42bc14fc2
Merge pull request #243 from cjmayo/warning
...
Replace deprecated log.warn
2019-04-24 10:58:31 -04:00
anarcat
bb0a1e1992
Merge pull request #242 from cjmayo/wummel
...
Update references to GitHub project from wummel to linkchecker
2019-04-24 10:58:15 -04:00
anarcat
ee8667e1ca
Merge pull request #229 from cjmayo/python3_19
...
{python3_19} Python3: fix unicode in fileurl
2019-04-24 10:57:45 -04:00
anarcat
492da5aee0
Merge pull request #227 from cjmayo/python3_17
...
{python3_17} Python3: fix unicode in url.py
2019-04-24 10:57:09 -04:00
Chris Mayo
f60810b050
Fix Python 3 "TypeError: decoding str is not supported" in FtpUrl.cwd
2019-04-22 19:34:46 +01:00
Chris Mayo
20e11f1b1f
Remove configuration.DonateUrl
2019-04-21 19:44:18 +01:00
Chris Mayo
ce1dd55d7a
Replace deprecated locale.format()
...
locale.format_string() was introduced in Python 2.5.
2019-04-21 19:28:54 +01:00
Petr Dlouhý
b40f4722c7
Python3: fix unicode in fileurl
2019-04-19 20:42:38 +01:00
Petr Dlouhý
f4b73c6d42
Python3: fix unicode in url.py
2019-04-19 19:57:25 +01:00
Chris Mayo
46179f681c
Replace deprecated log.warn
...
warning() has been the documented method since logging was introduced in
Python 2.3.
2019-04-18 20:10:03 +01:00
EsuS
004632a99b
Update references to GitHub project from wummel to linkchecker
...
Remove all mention of donations.
2019-04-18 19:59:52 +01:00
Petr Dlouhý
bc99dc51de
Python3: fix HtmlParser
2019-04-18 19:35:16 +01:00
Petr Dlouhý
2c6411d68e
Python3: fix regexp format
2019-04-17 19:50:06 +01:00
Petr Dlouhý
8f4acc3168
Python3: use str and basestring from builtins
2019-04-16 20:08:29 +01:00
anarcat
e93d18d6e9
Merge pull request #232 from cjmayo/gzip2
...
Remove leftovers from introduction of requests
2019-04-15 10:31:06 -04:00
Petr Dlouhý
2985e9ae65
Use Python 3 compatible octal masks
2019-04-13 20:37:39 +01:00
Chris Mayo
ff4a2e496e
Remove unused copy of gzip2
...
Not used since requests introduced in 7b34be590b .
2019-04-13 20:35:37 +01:00
anarcat
75626d456a
Merge pull request #217 from cjmayo/python3_07
...
{python3_07} Python3: use BytesIO instead of StringIO
2019-04-11 11:48:45 -04:00
anarcat
8223acd44e
Merge pull request #226 from cjmayo/python3_16
...
{python3_16} Python3: fix parsepdf
2019-04-11 11:47:57 -04:00
anarcat
2bdd155d56
Merge pull request #231 from cjmayo/python3_21
...
{python3_21} fix urllib imports
2019-04-11 11:47:50 -04:00
anarcat
ce76b7c82d
Merge pull request #222 from cjmayo/python3_12
...
{python3_12} Python3: fix bytes mark in parser/__init__.py
2019-04-11 11:46:41 -04:00
Petr Dlouhý
106d58c2da
Python3: use BytesIO instead of StringIO
2019-04-09 20:09:35 +01:00
Petr Dlouhý
79e05d1511
Python3: fix parsepdf
2019-04-09 20:09:35 +01:00
Petr Dlouhý
4acabf5cb5
fix urllib imports
2019-04-09 20:09:35 +01:00
Petr Dlouhý
aec8243348
Python3: fix bytes mark in parser/__init__.py
2019-04-09 20:09:35 +01:00
Petr Dlouhý
033f9fbdb3
Python3: mark bytes explicitly
2019-04-09 20:09:35 +01:00
Yaroslav Halchenko
7ed7919692
RF: place parser.flush() under mutex as well
...
Just a safety measure, not yet proven to be required but overall
makes sense
2018-11-06 10:58:10 -05:00
Yaroslav Halchenko
ee27e178ec
BF: place a mutex around apparently thread-unsafe parser.feed invocation
...
That leads to fix up of anchors analysis and probably other issues
such as floating number of found urls etc
2018-11-01 11:10:01 -04:00
Yaroslav Halchenko
b78c2d200e
DOC: minor typo fix
2018-11-01 11:08:09 -04:00
gerdneuman
de6a82b378
Added whatsapp:// to ignored protocols
...
Fixes https://github.com/wummel/linkchecker/issues/595
2018-08-09 13:49:15 +02:00
regexaurus
50a9ff65b8
Updated support (issues) URL
2018-08-03 00:53:47 -04:00
Marius Gedminas
6f55f446ae
Load cookies from the --cookiefile correctly
...
requests.cookies.merge_cookies() requires a dict or a CookieJar as the second argument.
We've been passing lists of Cookie objects instead.
Fixes #62 , harder this time.
2018-03-16 13:23:26 +02:00
Marius Gedminas
6becc08284
Fix internal error when using cookies
...
There was some kind of confusion between a module and a function argument,
introduced in commit 90257a1b5e .
Fixes #62 .
2018-03-15 23:30:41 +02:00
Petr Dlouhý
e615480850
Python3: fix reading Safari bookmarks
2018-01-19 09:52:43 +01:00
Petr Dlouhý
256202a20b
fixes for Python 3: fix proxysuport
2018-01-19 09:52:43 +01:00
Petr Dlouhý
f128c9c168
Python3: fix gzip2 format
2018-01-19 09:52:43 +01:00
Petr Dlouhý
a1b300c892
Python3: fix imports
2018-01-19 09:52:43 +01:00
Petr Dlouhý
0a13fae3b4
remove third party packages and use them as dependency
2018-01-09 23:25:27 +01:00
Petr Dlouhý
2daf685633
Python3: fix few htmllib problems
2018-01-05 22:48:46 +01:00
Petr Dlouhý
fb39a4116f
Python3: fix fileutil
2018-01-05 20:31:21 +01:00
Reinhold Füreder
e864bbdabf
Use os.makedirs(...) instead of os.mkdir(...)
2018-01-03 11:33:53 +01:00
Philipp Hahn
1368643a50
Fix fragment identifier quoting
...
According to <https://tools.ietf.org/html/rfc3986 >:
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
Fixes #96
2017-11-10 08:03:03 -05:00
Antoine Beaupré
71be9b941b
fix incorrect call to the logging module ( Closes : #847208 )
2017-11-03 09:47:01 -04:00
Félix Sipma
c8d9038ae8
improve get_plugin_folders() docstring
2017-10-18 15:58:18 +02:00
Félix Sipma
deca8c667e
introduce linkcheck.configuration.get_user_data()
2017-10-18 15:55:55 +02:00
Félix Sipma
a03e2e4ada
use xdg dirs for config & data
...
~/.linkchecker is used instead of the xdg equivalents if the directory
exists (backward compatibility).
2017-10-17 18:48:07 +02:00
Antoine Beaupré
9b12b5d66f
workaround new limitation in requests
...
newer requests do not expose the internal SSL socket object so we
cannot verify certificates. there was work to allow custom
verification routines which we could use, but this never finished:
https://github.com/shazow/urllib3/pull/257
so right now, just treat missing socket information as if the cert was
missing.
Closes : #76
2017-10-02 20:19:25 -04:00
Marius Gedminas
4a092c218c
Whitespace bigotry
2017-03-14 17:18:27 +02:00
anarcat
5471b63ceb
Merge pull request #39 from PetrDlouhy/fix/cache
...
Fix cache: Don't check one url multiple times
2017-03-14 09:26:07 -04:00
Marius Gedminas
fb1debaa68
Fix incompatible pointer type warnings
...
The warnings looked like this:
htmlparse.c: In function ‘yyparse’:
htmlparse.c:1810:18: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types]
htmlparse.y:40:13: note: expected ‘PyObject ** {aka struct _object **}’ but argument is of type ‘PyObject * {aka struct _object *}’
htmlparse.c:1927:12: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types]
htmlparse.y:40:13: note: expected ‘PyObject ** {aka struct _object **}’ but argument is of type ‘PyObject * {aka struct _object *}’
The argument is not used, so it doesn't really matter what pointer type
it is.
2017-02-24 15:04:09 +02:00
Petr Dlouhý
eaa538c814
don't check one url multiple times
2017-02-14 10:23:25 +01:00
Marius Gedminas
03dfe3d3a1
Fix "operation on ... may be undefined" [-Wsequence-point] warnings
...
Fixes a bunch of warnings like
htmlparse.y:509:25: warning: operation on ‘self->userData->buf’ may be undefined [-Wsequence-point]
htmlparse.y:518:29: warning: operation on ‘self->userData->tmp_buf’ may be undefined [-Wsequence-point]
which were a result of (macro-expanded) code like this (simplified):
if ((tmp = (tmp = PyMem_Realloc(...))) == NULL) return NULL;
The PyMem_Resize(p, ...) macro assigns the new value to p before
returning it, so there's no need to assign it again.
See http://bugs.python.org/issue1668036 for evidence (from 2007) that
this is indeed a documented side-effect of the macro API.
2017-02-13 15:20:33 +02:00
Graham Seaman
233e7dcf68
Allow wayback-format urls without affecting atom 'feed' urls
2017-02-09 11:43:45 +00:00
Marius Gedminas
743a5f31cb
Crawl HTML attributes in deterministic order
...
Fixes #17 .
2017-02-01 19:19:53 +02:00
Graham Seaman
2e32780dc7
Force header names to lower to allow for CaseInsensitvieDict variability
2017-02-01 16:28:07 +00:00
Marius Gedminas
3c99b6aa30
Fix TypeError: hasattr(): attribute name must be string
...
The one test failure in Travis happens in
TestConsole.test_internal_error, but only if you have the argcomplete
package installed.
This was a real bug in error reporting code.
2017-02-01 16:02:35 +02:00