Commit graph

6283 commits

Author SHA1 Message Date
Chris Mayo
dcbddfe045 Remove Python 2 ConfigParser import 2020-05-15 19:37:04 +01:00
Chris Mayo
f8c9faec1b Remove Python 2 cStringIO imports 2020-05-15 19:37:04 +01:00
Chris Mayo
f3eb787014
Merge pull request #382 from cjmayo/tidyten5
Make urllib imports and html.escape Python 3 only
2020-05-15 19:15:47 +01:00
Chris Mayo
bda9612273 Make html.escape Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
42de609f8e Make urllib imports Python 3 only 2020-05-14 20:15:28 +01:00
Chris Mayo
3c661a83d0 Replace parse_host_port() in checker.proxysupport with url.splitport() 2020-05-14 20:15:28 +01:00
Chris Mayo
91a069ac90
Merge pull request #380 from cjmayo/tidyten4
linkchecker and setup.py tidy up
2020-05-14 19:48:02 +01:00
Chris Mayo
40f7c44703 Ensure the Python 3 interpreter is used 2020-05-14 19:39:45 +01:00
Chris Mayo
c80002437e Update run-time version check 2020-05-13 19:50:19 +01:00
Chris Mayo
5300702991 Get yappi working with Python 3 2020-05-13 19:50:19 +01:00
Chris Mayo
adcc3e5690 Remove -R from linkchecker shebang
This turned on hash randomization in Python 2 (>=2.6.8), in Python 3
this is enabled by default, unless PYTHONHASHSEED is 0.
2020-05-13 19:50:19 +01:00
Chris Mayo
77de1545ef Use setuptools.find_packages() 2020-05-13 19:50:19 +01:00
Chris Mayo
08ddf658bc
Merge pull request #366 from cjmayo/userorpwd
Support login forms with user and/or password
2020-05-13 19:37:44 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
00c4a30386 Add user and password only loginurl tests 2020-05-13 19:32:29 +01:00
Chris Mayo
3ace021264 Support login forms with user and/or password 2020-05-13 19:32:25 +01:00
Chris Mayo
31a9f68c46
Merge pull request #367 from cjmayo/loginurl
Add test for loginurl
2020-05-12 20:08:57 +01:00
anarcat
8d41f4a86b
Merge pull request #376 from cjmayo/tidyten2
Remove # -*- coding: lines and inheriting object
2020-05-11 13:29:04 -04:00
Marius Gedminas
f82e10cd39
Merge pull request #379 from dcycle/python3
Update README and Dockerfile with Python 3
2020-05-10 12:15:57 +03:00
alberto56
1ebc6a1431 use python3 in readme and dockerfile 2020-05-09 08:03:23 -04:00
Chris Mayo
44e81d27dd Remove inheriting object
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1 Remove # -*- coding: lines
Except for tests that include non-unicode characters:

tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Marius Gedminas
fdbb3a3b76
Merge pull request #378 from linkchecker/fix-metaclass
Make _Logger an abstract base class
2020-05-02 17:34:36 +03:00
Marius Gedminas
22b0165b72 Make _Logger an abstract base class
The __metaclass__ syntax is a Python-2-ism.  It was replaced with

    class _Logger (object, metaclass=abc.ABCMeta):

in Python 3.  And then Python 3.4 introduced abc.ABC which is an empty
class that has ABCMeta as the metaclass, making it simpler to define
abstract base classes.
2020-04-30 23:09:42 +03:00
Chris Mayo
4d3e5abcfa Remove u string prefixes 2020-04-30 20:11:59 +01:00
anarcat
ab476fa4bf
Merge pull request #364 from cjmayo/parser5
Stop using HTML handlers and improve login form error handling
2020-04-30 09:28:48 -04:00
anarcat
19d683bca5
Merge pull request #375 from cjmayo/parser5a
Improve login form handling
2020-04-30 09:26:21 -04:00
Chris Mayo
1d1d9c3bde Add testing for variants of the robots meta directive 2020-04-29 20:14:10 +01:00
Chris Mayo
12a948894b Fix space style in linkcheck/htmlutil/linkparse.py 2020-04-29 20:07:00 +01:00
Chris Mayo
9eed070a73 Stop using HTML handlers
LinkFinder is the only remaining HTML handler therefore no need for
htmlsoup.process_soup() as an independent function or TagFinder as a
base class.
2020-04-29 20:07:00 +01:00
Chris Mayo
a1433767e5 Replace HtmlPrettyPrinter with pretty_print_html() 2020-04-29 20:07:00 +01:00
Chris Mayo
0361d9e0e8 Remove encoding and default fd from HtmlPrettyPrinter
Neither are used.
2020-04-29 20:07:00 +01:00
Chris Mayo
4ffdbf2406 Replace MetaRobotsFinder using BeautifulSoup.find() 2020-04-29 20:07:00 +01:00
Chris Mayo
a51f02cf66 Improve error handling and debugging for login form 2020-04-27 18:06:29 +01:00
Chris Mayo
9a33c2a659 Make requesting login form password work on Python 3 2020-04-27 18:06:29 +01:00
Chris Mayo
8fc0dcc055 Make matching login form credentials case-sensitive
The keys of the form.data dictionary are case-sensitive and therefore a
KeyError was possible if the configured values are not identical to
the input element name attributes.
2020-04-27 18:06:29 +01:00
Chris Mayo
7a6ef938cc Rename htmlutil.formsearch to htmlutil.loginformsearch
Make it clear that this module has only one specific use.
2020-04-27 18:06:29 +01:00
anarcat
350f8bfef9
Merge pull request #373 from linkchecker/fix-swf-parsing
SWF files are binary data
2020-04-27 09:39:52 -04:00
Marius Gedminas
680783b1ff SWF files are binary data
Should fix #372.
2020-04-27 11:25:37 +03:00
anarcat
183d483074
Merge pull request #365 from cjmayo/tidyten1
Remove use of the future package
2020-04-26 12:02:30 -04:00
anarcat
125146fb2c
Merge pull request #361 from cjmayo/parser4
Rename htmlsax.py to htmlsoup.py and add test_content_allows_robots
2020-04-25 17:56:29 -04:00
anarcat
87079312db
Merge pull request #371 from cjmayo/manhtml
Switch to mandoc for generating html man pages
2020-04-24 18:59:10 -04:00
Chris Mayo
b7c8ad9be7 Fix typo for -Dplugin in man page 2020-04-24 19:46:30 +01:00
Chris Mayo
5dd448cf05 Add link to unknownurl.py in man page 2020-04-24 19:46:30 +01:00
Chris Mayo
a506800c07 Replace `` in man page with bold formatting 2020-04-24 19:46:30 +01:00
Chris Mayo
e3b77f810e Update external links in man pages to https 2020-04-24 19:46:30 +01:00
Chris Mayo
a205a3722b Update man pages to optimise for both html and man
- Use "LinkChecker User Manual" as the source for both pages.
- .UR/.UE for external links to allow mandoc to create links in html.
- Use Linux man-pages format for cross references e.g.
  .BR linkcheckerrc (5) which are replace in the html by the Makefile.
2020-04-24 19:46:30 +01:00
Chris Mayo
441cda5e15 Switch to mandoc for generating html man pages
Removes the need for diff files and is a currently maintained project.

Cross references are only supported for mdoc macros but because we only
have two pages this can be achieved with sed.

A clean target is added to the Makefile to make development easier.
2020-04-24 19:46:30 +01:00
Chris Mayo
3b8af403be Add test for loginurl
A new cgi-bin directory is created to identify the scripts to be run by
http.server.CGIHTTPRequestHandler.
2020-04-19 19:05:55 +01:00
Chris Mayo
56b8c9f7ab Add tests for <meta name="robots" content="nofollow">
norobots.html was used for testing <meta name="robots"
content="nofollow"> in local files until [1]. This commit reinstates
local file testing and adds an http test.

Checking is reported by checker.httpurl.HttpUrl.content_allows_robots().

[1] ce733ae7 ("Don't check for robots.txt directives in local html
files.", 2014-03-19)
2020-04-18 20:30:46 +01:00