Commit graph

52 commits

Author SHA1 Message Date
Chris Mayo
1663e10fe7 Remove spaces after names in function definitions
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
b0ea72e8c1 Remove # -*- coding: lines
Except for tests that include non-unicode characters:

tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Chris Mayo
4d3e5abcfa Remove u string prefixes 2020-04-30 20:11:59 +01:00
Chris Mayo
12a948894b Fix space style in linkcheck/htmlutil/linkparse.py 2020-04-29 20:07:00 +01:00
Chris Mayo
9eed070a73 Stop using HTML handlers
LinkFinder is the only remaining HTML handler therefore no need for
htmlsoup.process_soup() as an independent function or TagFinder as a
base class.
2020-04-29 20:07:00 +01:00
Chris Mayo
4ffdbf2406 Replace MetaRobotsFinder using BeautifulSoup.find() 2020-04-29 20:07:00 +01:00
Marius Gedminas
680783b1ff SWF files are binary data
Should fix #372.
2020-04-27 11:25:37 +03:00
Chris Mayo
ee6628a831 Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py
Remove one subpackage and some import lines where htmlutil.linkparse is
also being used.
2020-04-18 20:30:45 +01:00
Chris Mayo
eb3cf28baa Remove support for start_end_element() callback
The LinkFinder handler start_end_element() callback does nothing apart
from call start_element().
2020-04-10 13:51:09 +01:00
Chris Mayo
9d8d251d06 Replace Parser lineno() and column() methods
Stop storing this data in Parser object state.
2020-04-08 20:03:35 +01:00
Chris Mayo
3ff3d72492 Use BeautifulSoup element attrs directly 2020-04-03 19:24:08 +01:00
Chris Mayo
a7e1e20172 Remove last line and column from Parser
Only used for debug log message and not very useful.
2020-04-03 19:24:08 +01:00
Chris Mayo
2c000683e1 Remove unused linkcheck.htmlutil.linkname module
Unused since:
d6d48b48 ("html parser: use name instead of peeking", 2019-07-22)
2020-03-30 19:31:11 +01:00
Chris Mayo
607328d5c5 Support Beautiful Soup line numbers 2019-10-05 19:38:57 +01:00
Petr Dlouhý
d6d48b4814 html parser: use name instead of peeking 2019-07-22 19:59:37 +01:00
Petr Dlouhý
51a06d8a1e Remove home-cooked htmlparser and use BeautifulSoup 2019-07-22 19:59:37 +01:00
anarcat
7cfb1136e9
Merge pull request #313 from cjmayo/titlefinder
Remove unused linkparse.TitleFinder
2019-10-07 11:30:10 -04:00
Chris Mayo
127c2272c4 Remove unused linkparse.TitleFinder
Stopped being used with removal of UrlBase.set_title_from_content() in:

7b34be59 ("Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.", 2014-03-01)
2019-10-05 19:43:33 +01:00
Chris Mayo
5732606c58 Remove urlutil.decode_for_unquote()
Not needed since all content is now being decoded on retrieval.

Added by:
a6643034 ("Python3: decode parts before submitting them to urllib.quote()", 2018-01-05)
2019-10-04 19:37:09 +01:00
Petr Dlouhý
e10f25b968 fixes for Python 3: fix running problems in Python 3 2019-09-10 19:30:09 +01:00
Petr Dlouhý
2c6411d68e Python3: fix regexp format 2019-04-17 19:50:06 +01:00
Marius Gedminas
743a5f31cb Crawl HTML attributes in deterministic order
Fixes #17.
2017-02-01 19:19:53 +02:00
Bastian Kleineidam
35eb30432e Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
Bastian Kleineidam
176b95a30e Do not strip quotes from resolved URLs. 2014-07-11 00:43:46 +02:00
Bastian Kleineidam
82dd76b0d7 Add PDF link parsing. 2014-04-28 18:13:45 +02:00
Bastian Kleineidam
981079c041 Support itemtype attribute parsing. 2014-04-23 22:03:20 +02:00
Bastian Kleineidam
4232b69633 Support <img> srcset attribute parsing. 2014-04-10 17:51:59 +02:00
Bastian Kleineidam
9c5693ad41 Add doc and copyright. 2014-03-30 19:23:42 +02:00
Bastian Kleineidam
b6b5c7a12e Simpler link parsing routine. 2014-03-27 19:49:17 +01:00
Bastian Kleineidam
81da2eb48f Code cleanup 2014-03-27 17:19:52 +01:00
Bastian Kleineidam
7b34be590b Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
Bastian Kleineidam
c806be5c15 Updated copyright 2014-01-08 22:33:04 +01:00
Bastian Kleineidam
78ed1e9e52 Do not GET on POST forms. 2013-12-10 23:42:43 +01:00
Bastian Kleineidam
9b8cb67d78 Updated copyright. 2013-01-17 20:41:47 +01:00
Bastian Kleineidam
4dad2aa33c Support dns-prefetch URLs. 2013-01-17 20:41:09 +01:00
Bastian Kleineidam
ecef16b2c9 Support WML sites. 2012-08-22 22:43:14 +02:00
Bastian Kleineidam
b550a9dcb5 Updated copyright. 2012-06-23 14:31:11 +02:00
Bastian Kleineidam
363ccc0121 Check <object codebase=...> as normal URL. 2012-06-23 14:28:32 +02:00
Bastian Kleineidam
cdf6b91b39 Don't use <object codebase=...> attribute as parent url. 2012-06-23 13:32:08 +02:00
Bastian Kleineidam
fb979b4f3c Add test for archive attribute support. 2011-12-30 12:36:22 +01:00
Bastian Kleineidam
d06c43d470 Split comma-separated archive attribute values. 2011-12-30 08:58:45 +01:00
Bastian Kleineidam
4a4985a960 Add HTML5 link elements and attributes. 2011-12-30 08:55:38 +01:00
Bastian Kleineidam
a1f0867c74 Updated copyright 2011-05-06 20:27:36 +02:00
Bastian Kleineidam
dacc7e7ae4 Consolidate the stop messages. 2011-04-29 19:49:24 +02:00
Bastian Kleineidam
76f7f6b6a3 Prefer anchor element content as name instead of title attribute. 2010-07-30 21:03:04 +02:00
Bastian Kleineidam
c4c098bd83 pep8-ify the source a little more 2010-03-13 08:47:12 +01:00
Bastian Kleineidam
57397e938b Improved linkname parsing by adding a new peek() HTML parser function. 2010-03-09 11:31:12 +01:00
Bastian Kleineidam
51a0ef0ad4 Speed up HTML parsing by stopping early and adding callbacks. 2010-03-08 09:04:33 +01:00
Bastian Kleineidam
5e06b6b8d4 Updated FSF address in GPL blurb 2009-07-24 23:58:20 +02:00