Chris Mayo
3c7fb5b571
Fix checking directory containing Unicode filenames
...
Non-Unicode filenames are not supported.
sys.platform has not returned "linux2" since Python 3.3.
2022-09-05 19:28:40 +01:00
Chris Mayo
d6936ceb91
Add warning url-content-type-unparseable
2022-09-02 19:29:11 +01:00
Chris Mayo
7c2036b68c
Drop support for Beautiful Soup < 4.8.1
...
The minimum version supported was already 4.8.0 because of the use
of multi_valued_attributes [1].
Test support for < 4.8.1 is the only code that needs removing [2].
[1] 3ff3d724 ("Use BeautifulSoup element attrs directly", 2020-04-03)
[2] 607328d5 ("Support Beautiful Soup line numbers", 2019-10-05)
2021-01-28 19:20:24 +00:00
Chris Mayo
314ec085a3
Merge pull request #462 from cjmayo/anchor
...
Fix anchor checking
2020-09-01 19:39:29 +01:00
Chris Mayo
737c61cd67
Merge pull request #484 from cjmayo/issuetests
...
Tests of img srcset and invalid host name
2020-08-22 16:32:03 +01:00
Chris Mayo
24c2f4ac39
Add test for invalid host name in content
...
Tests code added in:
d5690203 ("Fix critical exception when parsing a URL with a ]", 2020-08-08)
2020-08-15 17:04:41 +01:00
Chris Mayo
8c804c35a5
Detect sitemaps that do not start with an XML declaration
2020-08-11 19:35:56 +01:00
Chris Mayo
a7eacd6200
Add a test for a page with links to anchors
...
Query and fragment URL parts for filesystem URLs are ignored, therefore
test over http.
2020-07-27 19:22:32 +01:00
Chris Mayo
6f126a54d2
Add coverage for parser.sitemap.parse_sitemapindex()
2020-05-27 20:02:03 +01:00
Chris Mayo
d611564cb0
Add a test for an empty html file accessed over http
2020-05-23 20:01:24 +01:00
Marius Gedminas
5bd1fb4e36
Fix internal error on empty HTML files
...
When BeautifulSoup finds an empty file on disk, it sets
original_encoding to None. It doesn't matter what encoding we pick for
empty files, so let's just pick one.
I don't know if there are any circumstances where BeautifulSoup might
set the encoding to None for a non-empty file.
Closes #392 .
2020-05-21 19:01:33 +03:00
Chris Mayo
00c4a30386
Add user and password only loginurl tests
2020-05-13 19:32:29 +01:00
Chris Mayo
31a9f68c46
Merge pull request #367 from cjmayo/loginurl
...
Add test for loginurl
2020-05-12 20:08:57 +01:00
Chris Mayo
4ffdbf2406
Replace MetaRobotsFinder using BeautifulSoup.find()
2020-04-29 20:07:00 +01:00
Chris Mayo
3b8af403be
Add test for loginurl
...
A new cgi-bin directory is created to identify the scripts to be run by
http.server.CGIHTTPRequestHandler.
2020-04-19 19:05:55 +01:00
Chris Mayo
56b8c9f7ab
Add tests for <meta name="robots" content="nofollow">
...
norobots.html was used for testing <meta name="robots"
content="nofollow"> in local files until [1]. This commit reinstates
local file testing and adds an http test.
Checking is reported by checker.httpurl.HttpUrl.content_allows_robots().
[1] ce733ae7 ("Don't check for robots.txt directives in local html
files.", 2014-03-19)
2020-04-18 20:30:46 +01:00
Chris Mayo
74d5c68094
Add new tests for URL quoting
2019-10-05 19:38:57 +01:00
Chris Mayo
607328d5c5
Support Beautiful Soup line numbers
2019-10-05 19:38:57 +01:00
Petr Dlouhý
2c3c794e52
fix http test after parser change
2019-07-22 19:59:37 +01:00
Petr Dlouhý
d1844a526e
add charset tests
2019-07-22 19:59:37 +01:00
Chris Mayo
ec8b6e09f0
Fix XmlTagUrlParser and make Python 3 compatible
...
URLs within a sitemap file were not being captured.
2019-10-28 19:20:05 +00:00
Marius Gedminas
87b504785c
Add a regression test for the sitemap parser
2019-10-23 17:30:10 +03:00
Marius Gedminas
58b0d5aaae
Fix TypeError: string arg required in content_allows_robots()
...
See #323 an #317 .
2019-10-22 14:13:45 +03:00
Marius Gedminas
84dbb5d603
Fix TypeError: string arg required in find_links()
...
Fixes #317 .
2019-10-21 17:47:46 +03:00
Marius Gedminas
a4967fe92c
Add a regression test for issue #317
...
The important bit was making the `file_test` helper not ignore internal
errors.
2019-10-21 17:45:18 +03:00
Petr Dlouhý
c1ab81627e
test of correct logging of all parts in url_data
2018-01-14 17:17:07 +01:00
Philipp Hahn
1368643a50
Fix fragment identifier quoting
...
According to <https://tools.ietf.org/html/rfc3986 >:
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
Fixes #96
2017-11-10 08:03:03 -05:00
Petr Dlouhý
f5100138ff
fix tests that fail because of changed linkchecker output
2017-02-14 10:59:38 +01:00
Marius Gedminas
f4ec7531c1
Fix TestHttp.test_html
...
The HTML tag has two attributes with URLs:
<applet archive="file.html" src="file.css">
It would appear that the order in which these attributes are crawled
does not match the order in the result file.
Possibly the crawling order is non-deterministic, although I cannot
reproduce that. If that's the case, the fix would be to sort the
attributes in the crawler before following them, which means we want the
expected results sorted as well (and since 'archive' comes before 'src',
so file.html should come before file.css).
2017-02-01 18:41:47 +02:00
Bastian Kleineidam
914995b5fc
Use example.com for tests.
2016-01-19 12:17:08 +01:00
Vadim Khohlov
d4352fc828
Added plugin for parsing and checking links in Markdown files
2014-11-11 15:35:18 +02:00
Bastian Kleineidam
0fa7ed2699
Fix empty URL handling.
2014-07-03 23:34:40 +02:00
Bastian Kleineidam
b152ce7a6e
Add PDF test and fix page number.
2014-04-29 18:53:24 +02:00
Bastian Kleineidam
bca226c293
Fix assertion checking external links; fix tests
2014-03-10 18:23:44 +01:00
Bastian Kleineidam
6b334dc79b
Fix URL result caching.
2014-03-08 19:35:10 +01:00
Bastian Kleineidam
ef13a3fce1
Implement sitemap and sitemap index parsing.
2014-03-05 19:26:37 +01:00
Bastian Kleineidam
82f81241fd
Check all links and add better caching.
2014-03-03 23:29:45 +01:00
Bastian Kleineidam
7b34be590b
Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.
2014-03-01 00:12:34 +01:00
Bastian Kleineidam
b363945052
Adjust example.com/org tests. This seems to change every now and then.
2013-12-04 19:13:18 +01:00
Bastian Kleineidam
023da7c993
Remove the duplicate URL content check.
2013-12-04 19:12:40 +01:00
Bastian Kleineidam
a86e36e5d3
Fix test cases for example.com redirection.
2013-01-23 19:42:29 +01:00
Bastian Kleineidam
e6ad32c028
Catch UnicodeError for invalid host names.
2013-01-23 19:42:29 +01:00
Bastian Kleineidam
4dad2aa33c
Support dns-prefetch URLs.
2013-01-17 20:41:09 +01:00
Bastian Kleineidam
03f2e19cfd
Fix html tests.
2013-01-17 20:40:51 +01:00
Bastian Kleineidam
aaf35c0f4a
Added Word test.
2013-01-09 23:02:47 +01:00
Bastian Kleineidam
f9a7f5ef96
Restrict local file checking.
2012-11-07 18:07:00 +01:00
Bastian Kleineidam
eabaa41bd2
Do not check duplicate URLs.
2012-11-06 21:34:22 +01:00
Bastian Kleineidam
a77a5dddfd
Fix sporadic test failures with a dummy directory listing.
2012-10-15 14:36:27 +02:00
Bastian Kleineidam
aa2960e889
Fix content check.
2012-10-10 12:26:33 +02:00
Bastian Kleineidam
1b3b040be5
Fix check result order.
2012-10-01 10:28:42 +02:00