Chris Mayo
79eafee826
Add a test for VirusCheck
2020-05-17 19:04:49 +01:00
Chris Mayo
a15a2833ca
Remove spaces after names in class method definitions
...
And also nested functions.
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
1663e10fe7
Remove spaces after names in function definitions
...
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
fc11d08968
Remove spaces after names in class definitions
2020-05-16 20:19:42 +01:00
Chris Mayo
1416a08119
On Python 3 no need to convert os.linesep to a string
2020-05-16 17:02:01 +01:00
Chris Mayo
10552a79c7
Remove LinkCheckTest.fail_unicode()
...
No need to encode Python 3 strings before output.
2020-05-16 17:02:00 +01:00
Chris Mayo
9f95d06a39
Remove Python 2 test.test_support import
2020-05-16 16:26:38 +01:00
Chris Mayo
f8c9faec1b
Remove Python 2 cStringIO imports
2020-05-15 19:37:04 +01:00
Chris Mayo
bda9612273
Make html.escape Python 3 only
2020-05-14 20:15:28 +01:00
Chris Mayo
42de609f8e
Make urllib imports Python 3 only
2020-05-14 20:15:28 +01:00
Chris Mayo
08ddf658bc
Merge pull request #366 from cjmayo/userorpwd
...
Support login forms with user and/or password
2020-05-13 19:37:44 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
...
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
00c4a30386
Add user and password only loginurl tests
2020-05-13 19:32:29 +01:00
Chris Mayo
31a9f68c46
Merge pull request #367 from cjmayo/loginurl
...
Add test for loginurl
2020-05-12 20:08:57 +01:00
Chris Mayo
44e81d27dd
Remove inheriting object
...
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1
Remove # -*- coding: lines
...
Except for tests that include non-unicode characters:
tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Chris Mayo
4d3e5abcfa
Remove u string prefixes
2020-04-30 20:11:59 +01:00
anarcat
ab476fa4bf
Merge pull request #364 from cjmayo/parser5
...
Stop using HTML handlers and improve login form error handling
2020-04-30 09:28:48 -04:00
Chris Mayo
1d1d9c3bde
Add testing for variants of the robots meta directive
2020-04-29 20:14:10 +01:00
Chris Mayo
9eed070a73
Stop using HTML handlers
...
LinkFinder is the only remaining HTML handler therefore no need for
htmlsoup.process_soup() as an independent function or TagFinder as a
base class.
2020-04-29 20:07:00 +01:00
Chris Mayo
a1433767e5
Replace HtmlPrettyPrinter with pretty_print_html()
2020-04-29 20:07:00 +01:00
Chris Mayo
0361d9e0e8
Remove encoding and default fd from HtmlPrettyPrinter
...
Neither are used.
2020-04-29 20:07:00 +01:00
Chris Mayo
4ffdbf2406
Replace MetaRobotsFinder using BeautifulSoup.find()
2020-04-29 20:07:00 +01:00
Chris Mayo
8fc0dcc055
Make matching login form credentials case-sensitive
...
The keys of the form.data dictionary are case-sensitive and therefore a
KeyError was possible if the configured values are not identical to
the input element name attributes.
2020-04-27 18:06:29 +01:00
Chris Mayo
7a6ef938cc
Rename htmlutil.formsearch to htmlutil.loginformsearch
...
Make it clear that this module has only one specific use.
2020-04-27 18:06:29 +01:00
anarcat
183d483074
Merge pull request #365 from cjmayo/tidyten1
...
Remove use of the future package
2020-04-26 12:02:30 -04:00
Chris Mayo
3b8af403be
Add test for loginurl
...
A new cgi-bin directory is created to identify the scripts to be run by
http.server.CGIHTTPRequestHandler.
2020-04-19 19:05:55 +01:00
Chris Mayo
56b8c9f7ab
Add tests for <meta name="robots" content="nofollow">
...
norobots.html was used for testing <meta name="robots"
content="nofollow"> in local files until [1]. This commit reinstates
local file testing and adds an http test.
Checking is reported by checker.httpurl.HttpUrl.content_allows_robots().
[1] ce733ae7 ("Don't check for robots.txt directives in local html
files.", 2014-03-19)
2020-04-18 20:30:46 +01:00
Chris Mayo
d189445a8e
LinkFinder does not raise StopParse
2020-04-18 20:30:46 +01:00
Chris Mayo
ee6628a831
Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py
...
Remove one subpackage and some import lines where htmlutil.linkparse is
also being used.
2020-04-18 20:30:45 +01:00
Chris Mayo
a83fbb56c0
Remove from __future__ imports
2020-04-15 19:49:16 +01:00
Chris Mayo
f5e7f3a382
Remove use of the future package
...
It was providing Python 2 compatibility.
2020-04-15 19:49:16 +01:00
Chris Mayo
0795e3c1b4
Replace Parser class using BeautifulSoup.find_all()
2020-04-10 13:51:09 +01:00
Chris Mayo
eb3cf28baa
Remove support for start_end_element() callback
...
The LinkFinder handler start_end_element() callback does nothing apart
from call start_element().
2020-04-10 13:51:09 +01:00
Chris Mayo
c9f17e92b9
Remove support for end_element() callback
2020-04-10 13:51:09 +01:00
Chris Mayo
48b590cf8b
Replace FormFinder using BeautifulSoup.find_all()
...
FormFinder was the only handler that used an end_element() callback and
was therefore a blocker to moving the Parser class to use
BeautifulSoup.find_all()
FormFinder was a specialised handler used to parse a login form at
the start of a session if the user had configured authentication
credentials.
2020-04-10 13:51:05 +01:00
Chris Mayo
974915cc4f
Remove encoding from Parser
...
Only used by the test and an attribute of the soup object.
2020-04-08 20:03:35 +01:00
Chris Mayo
02e1c389b2
Remove parser flush() and reset()
...
Remnants of the feed() interface.
2020-04-08 20:03:35 +01:00
Chris Mayo
3771dd9136
Use parser.feed_soup() instead of parser.feed()
...
Markup is not being passed in pieces to the parser, so simplify the
interface and reduce the state further.
2020-04-08 20:03:35 +01:00
Chris Mayo
9d8d251d06
Replace Parser lineno() and column() methods
...
Stop storing this data in Parser object state.
2020-04-08 20:03:35 +01:00
Chris Mayo
514210199d
Add tests for search_form
2020-04-07 19:24:34 +01:00
Chris Mayo
036b900ffc
Remove unused linkcheck.containers classes
2020-04-03 19:24:08 +01:00
Chris Mayo
3ff3d72492
Use BeautifulSoup element attrs directly
2020-04-03 19:24:08 +01:00
Chris Mayo
28701e291a
Remove use of Python 2 unicode() and related u prefixes
...
Several instances for MS Windows left unchanged.
2020-04-01 19:39:50 +01:00
anarcat
cf4e6bb235
Merge pull request #351 from cjmayo/tagsonly
...
Remove support for non-Tag elements from Parser
2020-04-01 12:17:18 -04:00
Chris Mayo
9fc651e82b
Remove Python 2 compatibility from parser tests
2020-03-31 20:10:35 +01:00
Chris Mayo
ffa6ac457f
Remove support for non-Tag elements from Parser
...
This change is made because the linkchecker handlers only process
Tags.
The test HtmlPrettyPrinter handler is updated to output element text
because its support for non-Tag elements has been removed. This results
in a number of the existing tests still passing.
2020-03-31 20:10:35 +01:00
Chris Mayo
0ee4414a60
Replace memoized with functools.lru_cache
2020-03-31 19:46:31 +01:00
Chris Mayo
1255119ca8
Move HtmlPrinter and HtmlPrettyPrinter into tests
2020-03-30 19:32:30 +01:00
Chris Mayo
f743be57e8
Remove unused functions from linkcheck.HtmlParser
...
resolve_entities() unused since:
2c000683 ("Remove unused linkcheck.htmlutil.linkname module",
2020-03-30)
set_doctype(), set_encoding() unused since:
51a06d8a ("Remove home-cooked htmlparser and use BeautifulSoup",
2019-07-22)
2020-03-30 19:32:18 +01:00