linkchecker

mirror of https://github.com/Hopiu/linkchecker.git synced 2026-05-15 18:13:09 +00:00

Author	SHA1	Message	Date
Chris Mayo	03b1c4919d	Record encoding in debug log messages	2020-05-23 20:01:24 +01:00
Chris Mayo	f7337f55e8	Fix error due to an empty html file accessed over http Use the already fixed [1] UrlBase.get_content() in HttpUrl. [1] `5bd1fb4` ("Fix internal error on empty HTML files", 2020-05-21)	2020-05-23 20:01:24 +01:00
Marius Gedminas	f268a90cfb	Merge branch 'master' into HandleRateLimiting	2020-05-23 14:15:52 +03:00
Marius Gedminas	6dffacf17f	Merge pull request #409 from linkchecker/fix-login-timeouts Make sure login form fetching uses a timeout and sends User-Agent	2020-05-22 21:40:48 +03:00
Marius Gedminas	b0435b3d47	Make sure login form fetching uses a timeout Also resolve an XXX comment about the User-Agent header (which is configured in new_request_session), but add a couple of XXX comments about using proxy and possibly disabling TLS certificate checking.	2020-05-22 11:19:51 +03:00
Marius Gedminas	4f3fe5e1c3	Make sure fetching robots.txt uses the configured timeout Closes #396.	2020-05-22 10:53:33 +03:00
Marius Gedminas	c60d7c66e4	Clarify the decision to fall back to Latin-1	2020-05-21 19:35:39 +03:00
Marius Gedminas	5bd1fb4e36	Fix internal error on empty HTML files When BeautifulSoup finds an empty file on disk, it sets original_encoding to None. It doesn't matter what encoding we pick for empty files, so let's just pick one. I don't know if there are any circumstances where BeautifulSoup might set the encoding to None for a non-empty file. Closes #392.	2020-05-21 19:01:33 +03:00
Chris Mayo	6cfc8eeb49	Replace threading.Thread.setName() with setting the name property As recommended in: https://docs.python.org/3.5/library/threading.html#threading.Thread.setName	2020-05-20 19:58:44 +01:00
Chris Mayo	42eba19a7d	No need to encode url in Checker.check_url_data() Was causing b'' in log messages e.g. CheckThread-b'http:...	2020-05-20 19:58:44 +01:00
Chris Mayo	28f4587dfa	Remove str_text from fileutil.py, strformat.py and url.py	2020-05-19 19:56:42 +01:00
Chris Mayo	ebcc3c4961	Remove str_text from plugins/	2020-05-19 19:56:42 +01:00
Chris Mayo	1c14583535	Remove str_text from logger/	2020-05-19 19:56:42 +01:00
Chris Mayo	6bddd4ac60	Remove str_text from checker/	2020-05-19 19:56:42 +01:00
Chris Mayo	a127902607	Replace str_text in asserts	2020-05-19 19:56:42 +01:00
Chris Mayo	7490804e2c	Merge pull request #395 from cjmayo/tidyten11 Remove unused code from linkcheck/fileutil.py	2020-05-19 19:45:08 +01:00
Marius Gedminas	e6e969f975	Merge pull request #391 from linkchecker/dev-version Bump version in git to 10.0.0.dev0	2020-05-19 18:49:34 +03:00
Chris Mayo	690605c519	Remove unused code from linkcheck/fileutil.py	2020-05-18 19:29:55 +01:00
Marius Gedminas	5317347e54	Avoid distutils.version.StrictVersion distutils.version is old code that predates PEP 440. We could add a dependency on https://packaging.pypa.io/en/latest/version/, but meh.	2020-05-17 21:12:43 +03:00
Marius Gedminas	bb53aaa621	Fix viruscheck plugin The clamav interface needs bytes, not unicode. It would be nice if we had tests for this code.	2020-05-17 17:50:11 +01:00
Chris Mayo	a15a2833ca	Remove spaces after names in class method definitions And also nested functions. This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	1663e10fe7	Remove spaces after names in function definitions This is a PEP 8 convention, E211.	2020-05-16 20:19:42 +01:00
Chris Mayo	fc11d08968	Remove spaces after names in class definitions	2020-05-16 20:19:42 +01:00
Chris Mayo	1416a08119	On Python 3 no need to convert os.linesep to a string	2020-05-16 17:02:01 +01:00
Chris Mayo	0752408a44	Remove Python 2 use of sys.stdout in i18n.get_encoded_writer()	2020-05-16 17:02:00 +01:00
Chris Mayo	2c2e7e55ac	Remove CSVLogger.encode_row_s() Introduced during Python 3 conversion to maintaint Python 2 support: `55a7973b` ("Python3: fix csvlog", 2016-12-04)	2020-05-16 17:02:00 +01:00
Chris Mayo	ed13a926d3	Remove setting Python 2 xmlparser.returns_unicode	2020-05-16 17:02:00 +01:00
Chris Mayo	025637b08d	Remove Python 2 cookielib import	2020-05-16 16:26:38 +01:00
Chris Mayo	1e277444f4	Remove Python 2 thread import	2020-05-16 16:26:34 +01:00
Chris Mayo	dcbddfe045	Remove Python 2 ConfigParser import	2020-05-15 19:37:04 +01:00
Chris Mayo	f8c9faec1b	Remove Python 2 cStringIO imports	2020-05-15 19:37:04 +01:00
Chris Mayo	bda9612273	Make html.escape Python 3 only	2020-05-14 20:15:28 +01:00
Chris Mayo	42de609f8e	Make urllib imports Python 3 only	2020-05-14 20:15:28 +01:00
Chris Mayo	3c661a83d0	Replace parse_host_port() in checker.proxysupport with url.splitport()	2020-05-14 20:15:28 +01:00
Chris Mayo	c80002437e	Update run-time version check	2020-05-13 19:50:19 +01:00
Chris Mayo	08ddf658bc	Merge pull request #366 from cjmayo/userorpwd Support login forms with user and/or password	2020-05-13 19:37:44 +01:00
Chris Mayo	736c893707	Merge pull request #377 from cjmayo/tidyten3 Remove u string prefixes	2020-05-13 19:36:54 +01:00
Chris Mayo	3ace021264	Support login forms with user and/or password	2020-05-13 19:32:25 +01:00
Chris Mayo	44e81d27dd	Remove inheriting object All Python 3 classes are new-style.	2020-05-08 10:45:31 +01:00
Chris Mayo	b0ea72e8c1	Remove # -*- coding: lines Except for tests that include non-unicode characters: tests/test_po.py tests/test_strformat.py tests/test_url.py tests/checker/test_error.py tests/checker/test_news.py	2020-05-08 10:45:31 +01:00
Marius Gedminas	22b0165b72	Make _Logger an abstract base class The __metaclass__ syntax is a Python-2-ism. It was replaced with class _Logger (object, metaclass=abc.ABCMeta): in Python 3. And then Python 3.4 introduced abc.ABC which is an empty class that has ABCMeta as the metaclass, making it simpler to define abstract base classes.	2020-04-30 23:09:42 +03:00
Chris Mayo	4d3e5abcfa	Remove u string prefixes	2020-04-30 20:11:59 +01:00
anarcat	ab476fa4bf	Merge pull request #364 from cjmayo/parser5 Stop using HTML handlers and improve login form error handling	2020-04-30 09:28:48 -04:00
Chris Mayo	12a948894b	Fix space style in linkcheck/htmlutil/linkparse.py	2020-04-29 20:07:00 +01:00
Chris Mayo	9eed070a73	Stop using HTML handlers LinkFinder is the only remaining HTML handler therefore no need for htmlsoup.process_soup() as an independent function or TagFinder as a base class.	2020-04-29 20:07:00 +01:00
Chris Mayo	4ffdbf2406	Replace MetaRobotsFinder using BeautifulSoup.find()	2020-04-29 20:07:00 +01:00
Chris Mayo	a51f02cf66	Improve error handling and debugging for login form	2020-04-27 18:06:29 +01:00
Chris Mayo	9a33c2a659	Make requesting login form password work on Python 3	2020-04-27 18:06:29 +01:00
Chris Mayo	8fc0dcc055	Make matching login form credentials case-sensitive The keys of the form.data dictionary are case-sensitive and therefore a KeyError was possible if the configured values are not identical to the input element name attributes.	2020-04-27 18:06:29 +01:00
Chris Mayo	7a6ef938cc	Rename htmlutil.formsearch to htmlutil.loginformsearch Make it clear that this module has only one specific use.	2020-04-27 18:06:29 +01:00
anarcat	350f8bfef9	Merge pull request #373 from linkchecker/fix-swf-parsing SWF files are binary data	2020-04-27 09:39:52 -04:00
Marius Gedminas	680783b1ff	SWF files are binary data Should fix #372.	2020-04-27 11:25:37 +03:00
anarcat	183d483074	Merge pull request #365 from cjmayo/tidyten1 Remove use of the future package	2020-04-26 12:02:30 -04:00
Chris Mayo	d189445a8e	LinkFinder does not raise StopParse	2020-04-18 20:30:46 +01:00
Chris Mayo	ee6628a831	Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py Remove one subpackage and some import lines where htmlutil.linkparse is also being used.	2020-04-18 20:30:45 +01:00
Chris Mayo	384e1e196d	Remove Python 2 gettext builtin installation	2020-04-15 19:49:16 +01:00
Chris Mayo	a83fbb56c0	Remove from __future__ imports	2020-04-15 19:49:16 +01:00
Chris Mayo	f5e7f3a382	Remove use of the future package It was providing Python 2 compatibility.	2020-04-15 19:49:16 +01:00
Chris Mayo	0795e3c1b4	Replace Parser class using BeautifulSoup.find_all()	2020-04-10 13:51:09 +01:00
Chris Mayo	eb3cf28baa	Remove support for start_end_element() callback The LinkFinder handler start_end_element() callback does nothing apart from call start_element().	2020-04-10 13:51:09 +01:00
Chris Mayo	c9f17e92b9	Remove support for end_element() callback	2020-04-10 13:51:09 +01:00
Chris Mayo	48b590cf8b	Replace FormFinder using BeautifulSoup.find_all() FormFinder was the only handler that used an end_element() callback and was therefore a blocker to moving the Parser class to use BeautifulSoup.find_all() FormFinder was a specialised handler used to parse a login form at the start of a session if the user had configured authentication credentials.	2020-04-10 13:51:05 +01:00
Chris Mayo	974915cc4f	Remove encoding from Parser Only used by the test and an attribute of the soup object.	2020-04-08 20:03:35 +01:00
Chris Mayo	02e1c389b2	Remove parser flush() and reset() Remnants of the feed() interface.	2020-04-08 20:03:35 +01:00
Chris Mayo	3771dd9136	Use parser.feed_soup() instead of parser.feed() Markup is not being passed in pieces to the parser, so simplify the interface and reduce the state further.	2020-04-08 20:03:35 +01:00
Chris Mayo	40f43ae41c	Create one function to make soup objects	2020-04-08 20:03:35 +01:00
Chris Mayo	9d8d251d06	Replace Parser lineno() and column() methods Stop storing this data in Parser object state.	2020-04-08 20:03:35 +01:00
Chris Mayo	16e6fb2919	Fix incorrect character in FormFinder log message	2020-04-07 19:24:34 +01:00
Chris Mayo	00f940d979	Fix FormFinder callbacks for missing element_text element_text added in: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-04-07 19:24:34 +01:00
Chris Mayo	fe024fb0c8	Remove unused Parser.debug() method	2020-04-03 19:24:08 +01:00
Chris Mayo	0c5e3bb403	Remove old HtmlParser .gitignore htmlparse.output was a product of the built-in parser.	2020-04-03 19:24:08 +01:00
Chris Mayo	036b900ffc	Remove unused linkcheck.containers classes	2020-04-03 19:24:08 +01:00
Chris Mayo	3ff3d72492	Use BeautifulSoup element attrs directly	2020-04-03 19:24:08 +01:00
Chris Mayo	a7e1e20172	Remove last line and column from Parser Only used for debug log message and not very useful.	2020-04-03 19:24:08 +01:00
Chris Mayo	28701e291a	Remove use of Python 2 unicode() and related u prefixes Several instances for MS Windows left unchanged.	2020-04-01 19:39:50 +01:00
anarcat	cf4e6bb235	Merge pull request #351 from cjmayo/tagsonly Remove support for non-Tag elements from Parser	2020-04-01 12:17:18 -04:00
Chris Mayo	ffa6ac457f	Remove support for non-Tag elements from Parser This change is made because the linkchecker handlers only process Tags. The test HtmlPrettyPrinter handler is updated to output element text because its support for non-Tag elements has been removed. This results in a number of the existing tests still passing.	2020-03-31 20:10:35 +01:00
Chris Mayo	e7c5f353cd	Remove unused function linkcheck.fileutil.write_file() Doesn't appear to have ever been used. Causes flake8 error: linkcheck/fileutil.py:45:9: F821 undefined name 'file'	2020-03-31 19:46:31 +01:00
Chris Mayo	504004d4f0	Use ipaddress in network.iputil.is_valid_ip() ipaddress was introduced in Python 3.3.	2020-03-31 19:46:31 +01:00
Chris Mayo	2eb1424703	Replace deprecated plistlib.readPlistFromBytes() in bookmarks.safari Remove Python 2 code. plistlib.loads() was added in Python 3.4.	2020-03-31 19:46:31 +01:00
Chris Mayo	0ee4414a60	Replace memoized with functools.lru_cache	2020-03-31 19:46:31 +01:00
Chris Mayo	1255119ca8	Move HtmlPrinter and HtmlPrettyPrinter into tests	2020-03-30 19:32:30 +01:00
Chris Mayo	ce1d669329	Remove unused functions from linkcheck.httputil http_persistent() unused since: `4b818cb4` ("Detect more cases to close the connection, and close response objects", 2006-09-15) http_keepalive(), get_content_encoding() unused since: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2020-03-30 19:32:30 +01:00
Chris Mayo	5b66964afa	Remove unused .charset from checker classes Unused since: `4f8c2954` ("Don't set parser.encoding", 2019-10-05)	2020-03-30 19:32:30 +01:00
Chris Mayo	f743be57e8	Remove unused functions from linkcheck.HtmlParser resolve_entities() unused since: `2c000683` ("Remove unused linkcheck.htmlutil.linkname module", 2020-03-30) set_doctype(), set_encoding() unused since: `51a06d8a` ("Remove home-cooked htmlparser and use BeautifulSoup", 2019-07-22)	2020-03-30 19:32:18 +01:00
Chris Mayo	2c000683e1	Remove unused linkcheck.htmlutil.linkname module Unused since: `d6d48b48` ("html parser: use name instead of peeking", 2019-07-22)	2020-03-30 19:31:11 +01:00
Marius Gedminas	af0f50efa8	Restore support for older BeautifulSoup4 versions	2020-03-30 14:49:56 +03:00
Wes Haggard	dcdc64e878	Turn status code 429 into warning instead of failure	2020-03-25 16:36:08 -07:00
Marius Gedminas	a311ebb97e	Fix doctype tests I don't think linkchecker actually cares about the document type, so I'm not sure why we're even testing this...	2020-03-23 10:56:57 +02:00
Chris Mayo	5eaad24641	Use HTTP header encoding for decoding	2020-03-22 19:54:37 +00:00
Chris Mayo	f5ae90e824	Parser threading lock no longer required with Beautiful Soup	2020-03-22 19:54:37 +00:00
Chris Mayo	d3d6638973	Actually fix TypeError when checking https link The test was added but not the fix in: `ecd06776` ("Fix TypeError when checking https link and test", 2019-11-11) Which is caught by the new test when run on Python 3: ___________________ TestHttps.test_x509_to_dict__________________ [gw14] linux -- Python 3.6.9 /usr/bin/python3.6 tests/checker/test_https.py:72: in test_x509_to_dict self.assertEqual(httputil.x509_to_dict(cert)["notAfter"], linkcheck/httputil.py:47: in x509_to_dict parsedtime = asn1_generaltime_to_seconds(notAfter) linkcheck/httputil.py:68: in asn1_generaltime_to_seconds res = datetime.strptime(timestr, timeformat + 'Z') E TypeError: strptime() argument 1 must be str, not bytes	2019-11-19 20:06:10 +00:00
Chris Mayo	ec8b6e09f0	Fix XmlTagUrlParser and make Python 3 compatible URLs within a sitemap file were not being captured.	2019-10-28 19:20:05 +00:00
Marius Gedminas	8bdd402aed	Merge pull request #333 from linkchecker/fix-clamav-on-py3 Fix test_clamav.py on Python 3	2019-10-25 16:16:23 +03:00
Marius Gedminas	5b2b3613ec	Merge pull request #330 from linkchecker/fix-sitemap Fix sitemap parser	2019-10-25 16:15:55 +03:00
Marius Gedminas	f9766a2049	Python 3: fix bytes vs strings in viruscheck plugin Socket communication deals with bytes. There are probably remaining issues with the viruscheck plugin on Python 3, we just can't see them because the code is not fully covered with tests.	2019-10-25 14:24:07 +03:00
Chris Mayo	b2e63663f8	Make PdfParser Python 3 compatible basestring is not available in Python 3. Ensure all URLs are Unicode. url_data.get_raw_content() is returning bytes.	2019-10-24 19:57:27 +01:00
Marius Gedminas	a1af1e9717	Fix sitemap parser PyExpat wants bytes on Python 2. See #323.	2019-10-23 17:23:23 +03:00
Marius Gedminas	938467c3ae	Merge pull request #324 from cjmayo/pdfminer Add pdfminer to tox.ini and dev-requirements.txt to enable pdf test	2019-10-23 09:47:01 +03:00
Marius Gedminas	db3e25e934	Merge pull request #326 from linkchecker/fix-word-maybe Fix MS Word parser, hopefully	2019-10-22 18:08:46 +03:00
Marius Gedminas	c6de64978c	Merge pull request #325 from linkchecker/type-error-in-robot-parser Fix TypeError: string arg required in content_allows_robots()	2019-10-22 18:07:31 +03:00
Marius Gedminas	fa32a89d6b	Fix MS Word parser, hopefully MS Word files are binary data, and get_temp_filename() will write them to disk using open(..., 'wb'), so we want to pass bytes in there, not Unicode. See #323.	2019-10-22 16:39:57 +03:00
Marius Gedminas	58b0d5aaae	Fix TypeError: string arg required in content_allows_robots() See #323 an #317.	2019-10-22 14:13:45 +03:00
Chris Mayo	949f84d329	PdfParser requires bytes	2019-10-21 20:12:33 +01:00
Chris Mayo	7da64b16f0	Don't add linkcheck_dns directory to sys.path This code was added in: `efbbb656` ("Remove python-dns conflict by moving the dns module into a custom subdirectory.", 2012-12-07) Installation of linkcheck_dns stopped with: `0a13fae3` ("remove third party packages and use them as dependency", 2018-01-06)	2019-10-21 19:52:58 +01:00
Marius Gedminas	e274d74be2	Wait for threads to exit after stopping them This fixes a race condition where the main thread would check if any internal errors happened and get back a 0 while a worker thread was still busy printing the internal error message before incrementing the counter. Fixes #320. My experiments show that this adds no perceptible delay to the script runtime (on Linux). More specifically, there already is an annoying perceptible delay of about 1 second, but it's not caused by this change.	2019-10-21 18:23:58 +03:00
Marius Gedminas	84dbb5d603	Fix TypeError: string arg required in find_links() Fixes #317.	2019-10-21 17:47:46 +03:00
Chris Mayo	c7a32d67fe	Remove unused code from network subpackage	2019-10-19 10:27:34 +01:00
anarcat	f73ba54a2a	Merge pull request #308 from cjmayo/decode Decode content when retrieved	2019-10-10 09:46:32 -04:00
anarcat	7cfb1136e9	Merge pull request #313 from cjmayo/titlefinder Remove unused linkparse.TitleFinder	2019-10-07 11:30:10 -04:00
Chris Mayo	127c2272c4	Remove unused linkparse.TitleFinder Stopped being used with removal of UrlBase.set_title_from_content() in: `7b34be59` ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)	2019-10-05 19:43:33 +01:00
Chris Mayo	b7ec71d8cc	Always use utf-8 encoding when quoting	2019-10-05 19:38:57 +01:00
Chris Mayo	a9f147c347	Update fileutil.pathencode() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	5bb4524a63	Update strformat.ascii_safe() because paths are now strings	2019-10-05 19:38:57 +01:00
Chris Mayo	646e138166	Pass encoding when unquoting Else non-UTF-8 codes are misinterpreted: >>> from urllib import parse >>> parse.unquote("%FF") '�' >>> parse.unquote("%FF", "latin1") 'ÿ'	2019-10-05 19:38:57 +01:00
Chris Mayo	153e53ba03	Reuse soup object used for detecting encoding in the HTML parser	2019-10-05 19:38:57 +01:00
Chris Mayo	978042a54e	Hide Beautiful Soup soupsieve warning Shown every time linkchecker is run: /usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The soupsieve package is not installed. CSS selectors cannot be used. 'The soupsieve package is not installed. CSS selectors cannot be used.'	2019-10-05 19:38:57 +01:00
Chris Mayo	30df69c158	Improve pretty printed comments	2019-10-05 19:38:57 +01:00
Chris Mayo	607328d5c5	Support Beautiful Soup line numbers	2019-10-05 19:38:57 +01:00
Chris Mayo	4f8c2954cf	Don't set parser.encoding Read-only property with new Beautiful Soup parser.	2019-10-05 19:38:57 +01:00
Chris Mayo	5732606c58	Remove urlutil.decode_for_unquote() Not needed since all content is now being decoded on retrieval. Added by: `a6643034` ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)	2019-10-04 19:37:09 +01:00
Chris Mayo	2776eb5f52	Revert "Python3: fix opening file URLs" This reverts commit `4c9ec511b5`.	2019-10-04 19:37:09 +01:00
Chris Mayo	c6a06d99ac	Remove unnecessary unicode() from StatusLogger.writeln()	2019-09-30 20:06:48 +01:00
Petr Dlouhý	6e8da10942	fixes for Python 3: fix markdowncheck The translate() method of string objects (and Python 2 Unicode objects) only accepts a single, table argument.	2019-09-30 19:46:24 +01:00
Chris Mayo	e01ea0d9f0	Safari bookmark parser requires bytes	2019-09-30 19:46:24 +01:00
Chris Mayo	ad33d359c1	Adapt Opera bookmark parser to work with decoded data	2019-09-30 19:46:24 +01:00
Chris Mayo	9460064084	Use requests to decode the content of login form	2019-09-30 19:46:24 +01:00
Chris Mayo	5fc01455b7	Decode content when retrieved, use bs4 to detect encoding if non-Unicode UrlBase has been modified as follows: - the "data" variable now holds bytes - decoded content is stored in a new variable "text" - functionality from get_content() has been split out into get_raw_content() which returns "data" and download_content() which calls read_content() and sets the download related variables. This allows for subclasses to do their own decoding and parsers to use bytes.	2019-09-30 19:46:24 +01:00
Chris Mayo	0c90c718bf	Revert "Python3: fix bytes mark in parser/__init__.py" This reverts commit `aec8243348`.	2019-09-30 19:46:24 +01:00
Chris Mayo	53cd9475b5	Replace deprecated cgi.escape html provided for Python 2 by future https://python-future.org/compatible_idioms.html#html-escaping-and-entities	2019-09-17 20:25:05 +01:00
anarcat	1590408a65	Merge pull request #306 from cjmayo/python3_49 {python3_49} enable and fix remaining bookmark tests	2019-09-16 15:18:26 -04:00
Petr Dlouhý	eaa7131523	enable and fix remaining bookmark tests biplist module preferred for reading Safari bookmarks in bookmarks/safari.py so install it for tox testing.	2019-09-16 20:08:01 +01:00
anarcat	4ccf0fb2d0	Merge pull request #305 from cjmayo/python3_48 {python3_48} Python3: fix displaying help	2019-09-16 10:10:36 -04:00
anarcat	2c7573b3b8	Merge pull request #300 from cjmayo/python3_43 {python3_43} Python3: fix for test_telnet in urlbase.py	2019-09-16 10:08:18 -04:00
anarcat	bec68f237b	Merge pull request #299 from cjmayo/python3_42 {python3_42} fixes for Python 3: fix telneturl	2019-09-16 10:07:55 -04:00
anarcat	27d672c78b	Merge pull request #297 from cjmayo/python3_40 {python3_40} Python3: fixes form checker/__init__.py	2019-09-16 10:06:05 -04:00
anarcat	5a0a02ae74	Merge pull request #294 from cjmayo/python3_39_alt {python3_39_alt} Python3: fix TypeError in HttpUrl.read_content()	2019-09-16 10:04:23 -04:00
Petr Dlouhý	14e19efe07	Python3: fix displaying help	2019-09-15 19:50:05 +01:00
Petr Dlouhý	c2af88ad2e	Python3: fix for test_telnet in urlbase.py	2019-09-15 19:49:26 +01:00
Petr Dlouhý	a2e67af7b4	fixes for Python 3: fix telneturl	2019-09-15 19:49:18 +01:00
Petr Dlouhý	bb542b00e9	Python3: fixes form checker/__init__.py	2019-09-15 19:49:00 +01:00
Chris Mayo	06fdd78f91	Python3: fix TypeError in HttpUrl.read_content() From test_http_redirect: File "linkchecker/linkcheck/checker/httpurl.py", line 323, in read_content line: buf.write(data) locals: buf = <local> <_io.StringIO object at 0x7f8fe2f45e10> buf.write = <local> <built-in method write of _io.StringIO object at 0x7f8fe2f45e10> data = <local> b'<a href="newurl.html">Recursive Redirect</a>\n' TypeError: string argument expected, got 'bytes'	2019-09-15 19:42:29 +01:00
anarcat	736d2a786d	Merge pull request #293 from cjmayo/python3_37_alt {python3_37_alt} Python3: fix TypeError when parsing cookie data	2019-09-14 11:51:26 -04:00
anarcat	fe39db4fbf	Merge pull request #287 from cjmayo/python3_36 {python3_36} fixes for Python 3 + Travis test: fix cgi	2019-09-14 11:50:53 -04:00
Chris Mayo	a7b7e31917	Python3: fix TypeError when parsing cookie data > fp = BytesIO(strheader) E TypeError: a bytes-like object is required, not 'str' linkcheck/cookies.py:61: TypeError The email package provides the message_from_string() convenience function which avoids the need to create a file-like object. Indeed http.client.HTTPMessage is implemented using email.message.Message.	2019-09-13 20:10:25 +01:00
Petr Dlouhý	36465112d0	fixes for Python 3 + Travis test: fix cgi	2019-09-13 19:46:13 +01:00
anarcat	aaa8cb675e	Merge pull request #291 from cjmayo/python3_33_alt {python3_33_alt} Python3: fix opening file URLs	2019-09-13 10:31:20 -04:00
anarcat	80b62a3e21	Merge pull request #292 from cjmayo/lc_cgi_error Fix errors caused by logging LCFormError exceptions	2019-09-13 09:12:05 -04:00
anarcat	b0b392f7cc	Merge pull request #282 from cjmayo/python3_31 {python3_31} Python3: fix strformat strline()	2019-09-13 09:11:33 -04:00
Chris Mayo	6dc25547d5	Fix errors caused by logging LCFormError exceptions	2019-09-12 20:13:08 +01:00
Chris Mayo	4c9ec511b5	Python3: fix opening file URLs urllib.request.urlopen() expects a string or Request object.	2019-09-12 19:58:27 +01:00
anarcat	eb2e3271a2	Merge pull request #279 from cjmayo/python3_28 {python3_28} Python3: fix robotparser	2019-09-12 08:40:18 -04:00
anarcat	8c072fa757	Merge pull request #289 from cjmayo/python3_38 {python3_38} Python3: fix linkname.py	2019-09-12 08:39:29 -04:00
Petr Dlouhý	538c4cfeb9	Python3: fix linkname.py	2019-09-11 20:32:33 +01:00
Petr Dlouhý	8a294be95f	Python3: fix robotparser	2019-09-11 20:04:26 +01:00
anarcat	44944754d5	Merge pull request #286 from cjmayo/python3_35 {python3_35} Python3: fix unichr() in htmlparser	2019-09-11 09:48:35 -04:00
anarcat	2239458966	Merge pull request #285 from cjmayo/python3_34 {python3_34} fixes for Python 3: fix test_misc	2019-09-11 09:48:14 -04:00
anarcat	dbbb64cd90	Merge pull request #283 from cjmayo/python3_32 {python3_32} fixes for Python 3 + Travis test: fix threads	2019-09-11 09:47:44 -04:00
anarcat	492058a360	Merge pull request #281 from cjmayo/python3_30 {python3_30} Python3: fix decoding strings	2019-09-11 09:47:10 -04:00
anarcat	8eadc5f8a1	Merge pull request #280 from cjmayo/python3_29 {python3_29} fixes for Python 3: fix running problems in Python 3	2019-09-11 09:46:48 -04:00
Petr Dlouhý	f272206110	Python3: fix decoding strings	2019-09-10 19:52:23 +01:00
Petr Dlouhý	55a7973b93	Python3: fix csvlog	2019-09-10 19:42:26 +01:00
Petr Dlouhý	e10f25b968	fixes for Python 3: fix running problems in Python 3	2019-09-10 19:30:09 +01:00
Petr Dlouhý	d20ac0e108	Python3: fix strformat strline()	2019-09-09 19:51:30 +01:00
Petr Dlouhý	8b9f29ae52	Python3: fix unichr() in htmlparser	2019-09-09 19:51:30 +01:00
Petr Dlouhý	129a68da38	fixes for Python 3: fix test_misc	2019-09-09 19:51:30 +01:00
Petr Dlouhý	57f7ba0979	fixes for Python 3 + Travis test: fix threads	2019-09-09 19:51:30 +01:00
Marius Gedminas	60f9f80b9f	Fix test_console.py on Python 3 This is a alternative fix I suggested in the comments on PR #273.	2019-09-09 18:52:29 +03:00
anarcat	4e6c806bff	Merge pull request #274 from cjmayo/python3_24 {python3_24} Python3: fix logger	2019-09-09 11:50:04 -04:00
Marius Gedminas	bb573e5eb1	Merge pull request #272 from cjmayo/python3_22 {python3_22} Python3: fix decode_parts function	2019-09-09 18:37:49 +03:00
anarcat	5c9376cfe2	Merge pull request #276 from cjmayo/python3_26 {python3_26} Python3: fix fileutil	2019-09-09 09:40:18 -04:00
Petr Dlouhý	0d7a2cac72	Python3: fix decode_parts function	2019-09-06 19:45:20 +01:00
Petr Dlouhý	9156576778	Python3: fix logger	2019-09-06 19:41:37 +01:00
Petr Dlouhý	ffb0a68ff7	Python3: fix fileurl	2019-09-05 19:41:53 +01:00
anarcat	59ab0644fd	Merge pull request #230 from cjmayo/python3_20 {python3_20} Python3: decode parts before submitting them to urllib.quote()	2019-09-04 09:48:19 -04:00
Petr Dlouhý	b5111453d8	change test_parse encoding to UTF-8	2019-07-22 19:59:37 +01:00
Petr Dlouhý	d6d48b4814	html parser: use name instead of peeking	2019-07-22 19:59:37 +01:00
Petr Dlouhý	51a06d8a1e	Remove home-cooked htmlparser and use BeautifulSoup	2019-07-22 19:59:37 +01:00
Nick Muerdter	fb3f65cdcc	Fix CSV output containing increasing number of null byte characters. The CSV buffer is being truncated on each new row, but since the stream's pointer isn't also being reset, each new row starts at the same position as the previous row, but with null bytes up until that point. This leads to increasing growth in the length of each CSV row, since each line will be padded with null bytes equivalent to the previous row's length.	2019-05-31 18:52:57 -06:00
Petr Dlouhý	a6643034fb	Python3: decode parts before submitting them to urllib.quote()	2019-05-10 20:06:01 +01:00
Chris Mayo	1c2e6c465e	squash! Python3: fix strformat ascii_safe() and unicode_safe()	2019-05-10 08:58:52 -04:00
Petr Dlouhý	ac14585a78	Python3: fix strformat for test_file	2019-05-10 08:58:52 -04:00
Petr Dlouhý	acaf8e671e	Python3: fix strformat unicode_safe()	2019-05-10 08:58:52 -04:00
Petr Dlouhý	e11ba8e427	squash! Python3: fix strformat ascii_safe() and unicode_safe() From: fixes for Python 3: fix running problems in Python 3	2019-05-10 08:58:52 -04:00
Petr Dlouhý	a1c6c4935e	Python3: fix strformat ascii_safe() and unicode_safe()	2019-05-10 08:58:52 -04:00
anarcat	9c9706a07a	Merge pull request #256 from cjmayo/parse_qs Replace deprecated cgi.parse_qs	2019-04-27 13:27:19 -04:00
Chris Mayo	a355476b82	Replace deprecated regexp flags not at start DeprecationWarning: Flags not at the start of the expression	2019-04-26 19:25:59 +01:00
Chris Mayo	5ae40c1ae2	Replace deprecated cgi.parse_qs	2019-04-26 19:23:45 +01:00
anarcat	59fe9ed876	Merge pull request #228 from cjmayo/python3_18 {python3_18} Python3: fix unicode in urlbase	2019-04-25 16:17:00 -04:00
anarcat	70f0bbf225	Merge pull request #250 from cjmayo/ftpserver Get FtpServerTest working by updating to current pyftpdlib API	2019-04-25 16:16:33 -04:00
Petr Dlouhý	e92b0a9f7b	Python3: fix unicode in urlbase	2019-04-25 19:57:45 +01:00
Petr Dlouhý	b3881ce3b5	Python3: fix urlbase, strformat and others	2019-04-25 19:57:45 +01:00
anarcat	056ba1d717	Merge pull request #248 from cjmayo/donateurl Remove configuration.DonateUrl	2019-04-24 10:59:50 -04:00
anarcat	b656346352	Merge pull request #246 from cjmayo/locale_format Replace deprecated locale.format()	2019-04-24 10:59:17 -04:00
anarcat	a42bc14fc2	Merge pull request #243 from cjmayo/warning Replace deprecated log.warn	2019-04-24 10:58:31 -04:00
anarcat	bb0a1e1992	Merge pull request #242 from cjmayo/wummel Update references to GitHub project from wummel to linkchecker	2019-04-24 10:58:15 -04:00
anarcat	ee8667e1ca	Merge pull request #229 from cjmayo/python3_19 {python3_19} Python3: fix unicode in fileurl	2019-04-24 10:57:45 -04:00
anarcat	492da5aee0	Merge pull request #227 from cjmayo/python3_17 {python3_17} Python3: fix unicode in url.py	2019-04-24 10:57:09 -04:00
Chris Mayo	f60810b050	Fix Python 3 "TypeError: decoding str is not supported" in FtpUrl.cwd	2019-04-22 19:34:46 +01:00
Chris Mayo	20e11f1b1f	Remove configuration.DonateUrl	2019-04-21 19:44:18 +01:00
Chris Mayo	ce1dd55d7a	Replace deprecated locale.format() locale.format_string() was introduced in Python 2.5.	2019-04-21 19:28:54 +01:00
Petr Dlouhý	b40f4722c7	Python3: fix unicode in fileurl	2019-04-19 20:42:38 +01:00
Petr Dlouhý	f4b73c6d42	Python3: fix unicode in url.py	2019-04-19 19:57:25 +01:00
Chris Mayo	46179f681c	Replace deprecated log.warn warning() has been the documented method since logging was introduced in Python 2.3.	2019-04-18 20:10:03 +01:00
EsuS	004632a99b	Update references to GitHub project from wummel to linkchecker Remove all mention of donations.	2019-04-18 19:59:52 +01:00
Petr Dlouhý	bc99dc51de	Python3: fix HtmlParser	2019-04-18 19:35:16 +01:00
Petr Dlouhý	2c6411d68e	Python3: fix regexp format	2019-04-17 19:50:06 +01:00
Petr Dlouhý	8f4acc3168	Python3: use str and basestring from builtins	2019-04-16 20:08:29 +01:00
anarcat	e93d18d6e9	Merge pull request #232 from cjmayo/gzip2 Remove leftovers from introduction of requests	2019-04-15 10:31:06 -04:00
Petr Dlouhý	2985e9ae65	Use Python 3 compatible octal masks	2019-04-13 20:37:39 +01:00
Chris Mayo	ff4a2e496e	Remove unused copy of gzip2 Not used since requests introduced in `7b34be590b`.	2019-04-13 20:35:37 +01:00
anarcat	75626d456a	Merge pull request #217 from cjmayo/python3_07 {python3_07} Python3: use BytesIO instead of StringIO	2019-04-11 11:48:45 -04:00
anarcat	8223acd44e	Merge pull request #226 from cjmayo/python3_16 {python3_16} Python3: fix parsepdf	2019-04-11 11:47:57 -04:00
anarcat	2bdd155d56	Merge pull request #231 from cjmayo/python3_21 {python3_21} fix urllib imports	2019-04-11 11:47:50 -04:00
anarcat	ce76b7c82d	Merge pull request #222 from cjmayo/python3_12 {python3_12} Python3: fix bytes mark in parser/__init__.py	2019-04-11 11:46:41 -04:00
Petr Dlouhý	106d58c2da	Python3: use BytesIO instead of StringIO	2019-04-09 20:09:35 +01:00
Petr Dlouhý	79e05d1511	Python3: fix parsepdf	2019-04-09 20:09:35 +01:00
Petr Dlouhý	4acabf5cb5	fix urllib imports	2019-04-09 20:09:35 +01:00
Petr Dlouhý	aec8243348	Python3: fix bytes mark in parser/__init__.py	2019-04-09 20:09:35 +01:00
Petr Dlouhý	033f9fbdb3	Python3: mark bytes explicitly	2019-04-09 20:09:35 +01:00
Yaroslav Halchenko	7ed7919692	RF: place parser.flush() under mutex as well Just a safety measure, not yet proven to be required but overall makes sense	2018-11-06 10:58:10 -05:00
Yaroslav Halchenko	ee27e178ec	BF: place a mutex around apparently thread-unsafe parser.feed invocation That leads to fix up of anchors analysis and probably other issues such as floating number of found urls etc	2018-11-01 11:10:01 -04:00
Yaroslav Halchenko	b78c2d200e	DOC: minor typo fix	2018-11-01 11:08:09 -04:00
gerdneuman	de6a82b378	Added whatsapp:// to ignored protocols Fixes https://github.com/wummel/linkchecker/issues/595	2018-08-09 13:49:15 +02:00
regexaurus	50a9ff65b8	Updated support (issues) URL	2018-08-03 00:53:47 -04:00
Marius Gedminas	6f55f446ae	Load cookies from the --cookiefile correctly requests.cookies.merge_cookies() requires a dict or a CookieJar as the second argument. We've been passing lists of Cookie objects instead. Fixes #62, harder this time.	2018-03-16 13:23:26 +02:00
Marius Gedminas	6becc08284	Fix internal error when using cookies There was some kind of confusion between a module and a function argument, introduced in commit `90257a1b5e`. Fixes #62.	2018-03-15 23:30:41 +02:00
Petr Dlouhý	e615480850	Python3: fix reading Safari bookmarks	2018-01-19 09:52:43 +01:00
Petr Dlouhý	256202a20b	fixes for Python 3: fix proxysuport	2018-01-19 09:52:43 +01:00
Petr Dlouhý	f128c9c168	Python3: fix gzip2 format	2018-01-19 09:52:43 +01:00
Petr Dlouhý	a1b300c892	Python3: fix imports	2018-01-19 09:52:43 +01:00
Petr Dlouhý	0a13fae3b4	remove third party packages and use them as dependency	2018-01-09 23:25:27 +01:00
Petr Dlouhý	2daf685633	Python3: fix few htmllib problems	2018-01-05 22:48:46 +01:00
Petr Dlouhý	fb39a4116f	Python3: fix fileutil	2018-01-05 20:31:21 +01:00
Reinhold Füreder	e864bbdabf	Use os.makedirs(...) instead of os.mkdir(...)	2018-01-03 11:33:53 +01:00
Philipp Hahn	1368643a50	Fix fragment identifier quoting According to <https://tools.ietf.org/html/rfc3986>: fragment = ( pchar / "/" / "?" ) pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded = "%" HEXDIG HEXDIG sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "" / "+" / "," / ";" / "=" Fixes #96	2017-11-10 08:03:03 -05:00
Antoine Beaupré	71be9b941b	fix incorrect call to the logging module (Closes: #847208 )	2017-11-03 09:47:01 -04:00
Félix Sipma	c8d9038ae8	improve get_plugin_folders() docstring	2017-10-18 15:58:18 +02:00
Félix Sipma	deca8c667e	introduce linkcheck.configuration.get_user_data()	2017-10-18 15:55:55 +02:00
Félix Sipma	a03e2e4ada	use xdg dirs for config & data ~/.linkchecker is used instead of the xdg equivalents if the directory exists (backward compatibility).	2017-10-17 18:48:07 +02:00
Antoine Beaupré	9b12b5d66f	workaround new limitation in requests newer requests do not expose the internal SSL socket object so we cannot verify certificates. there was work to allow custom verification routines which we could use, but this never finished: https://github.com/shazow/urllib3/pull/257 so right now, just treat missing socket information as if the cert was missing. Closes: #76	2017-10-02 20:19:25 -04:00
Marius Gedminas	4a092c218c	Whitespace bigotry	2017-03-14 17:18:27 +02:00
anarcat	5471b63ceb	Merge pull request #39 from PetrDlouhy/fix/cache Fix cache: Don't check one url multiple times	2017-03-14 09:26:07 -04:00
Marius Gedminas	fb1debaa68	Fix incompatible pointer type warnings The warnings looked like this: htmlparse.c: In function ‘yyparse’: htmlparse.c:1810:18: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types] htmlparse.y:40:13: note: expected ‘PyObject {aka struct _object }’ but argument is of type ‘PyObject * {aka struct _object }’ htmlparse.c:1927:12: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types] htmlparse.y:40:13: note: expected ‘PyObject * {aka struct _object *}’ but argument is of type ‘PyObject {aka struct _object *}’ The argument is not used, so it doesn't really matter what pointer type it is.	2017-02-24 15:04:09 +02:00
Petr Dlouhý	eaa538c814	don't check one url multiple times	2017-02-14 10:23:25 +01:00
Marius Gedminas	03dfe3d3a1	Fix "operation on ... may be undefined" [-Wsequence-point] warnings Fixes a bunch of warnings like htmlparse.y:509:25: warning: operation on ‘self->userData->buf’ may be undefined [-Wsequence-point] htmlparse.y:518:29: warning: operation on ‘self->userData->tmp_buf’ may be undefined [-Wsequence-point] which were a result of (macro-expanded) code like this (simplified): if ((tmp = (tmp = PyMem_Realloc(...))) == NULL) return NULL; The PyMem_Resize(p, ...) macro assigns the new value to p before returning it, so there's no need to assign it again. See http://bugs.python.org/issue1668036 for evidence (from 2007) that this is indeed a documented side-effect of the macro API.	2017-02-13 15:20:33 +02:00
Graham Seaman	233e7dcf68	Allow wayback-format urls without affecting atom 'feed' urls	2017-02-09 11:43:45 +00:00
Marius Gedminas	743a5f31cb	Crawl HTML attributes in deterministic order Fixes #17.	2017-02-01 19:19:53 +02:00
Graham Seaman	2e32780dc7	Force header names to lower to allow for CaseInsensitvieDict variability	2017-02-01 16:28:07 +00:00
Marius Gedminas	3c99b6aa30	Fix TypeError: hasattr(): attribute name must be string The one test failure in Travis happens in TestConsole.test_internal_error, but only if you have the argcomplete package installed. This was a real bug in error reporting code.	2017-02-01 16:02:35 +02:00

... 3 4 5 6 7 ...

3273 commits