Graham Seaman
233e7dcf68
Allow wayback-format urls without affecting atom 'feed' urls
2017-02-09 11:43:45 +00:00
Marius Gedminas
743a5f31cb
Crawl HTML attributes in deterministic order
...
Fixes #17 .
2017-02-01 19:19:53 +02:00
Marius Gedminas
a825b9d901
Mark the non-deterministic test as xfail
2017-02-01 18:57:40 +02:00
Marius Gedminas
02869ea076
Mark TestFile.test_directory_listing as known to fail
...
The test unzipps a zip file with a weird-looking non-ASCII filename in it.
I don't think zip files specify the encoding for filenames. Different
unzip utilities may interpret the filename differently. Plus, the byte
representation of the unzipped filename may be different depending on
the filesystem charset.
To me it looks as if the filename is garbage encoded as valid UTF-8, and
the test expectation is to get it in latin-1 or something.
2017-02-01 18:45:05 +02:00
Marius Gedminas
cffea5fcbd
Mark TestHttps.test_https as known to fail
...
This test depends on the way http://amazon.com/ works. I don't think
that's a good idea.
2017-02-01 18:44:21 +02:00
Marius Gedminas
f4ec7531c1
Fix TestHttp.test_html
...
The HTML tag has two attributes with URLs:
<applet archive="file.html" src="file.css">
It would appear that the order in which these attributes are crawled
does not match the order in the result file.
Possibly the crawling order is non-deterministic, although I cannot
reproduce that. If that's the case, the fix would be to sort the
attributes in the crawler before following them, which means we want the
expected results sorted as well (and since 'archive' comes before 'src',
so file.html should come before file.css).
2017-02-01 18:41:47 +02:00
Bastian Kleineidam
0ef00eea56
Move GUI files to separate project
2016-01-23 13:28:15 +01:00
Bastian Kleineidam
e410169fd0
Remove unused test module
2016-01-20 20:18:09 +01:00
Bastian Kleineidam
88c060699d
Fix tests
2016-01-19 22:05:15 +01:00
Bastian Kleineidam
914995b5fc
Use example.com for tests.
2016-01-19 12:17:08 +01:00
Vadim Khohlov
d4352fc828
Added plugin for parsing and checking links in Markdown files
2014-11-11 15:35:18 +02:00
Bastian Kleineidam
7239cd1b76
Add test for itms-services URL.
2014-09-05 21:37:33 +02:00
Bastian Kleineidam
4e1e756ca4
Updated copyright.
2014-07-16 07:35:32 +02:00
Bastian Kleineidam
e24ba214da
Fix config test.
2014-07-15 22:31:19 +02:00
Bastian Kleineidam
032c4091c3
Some easy python3 compatibility changes.
2014-07-15 18:40:47 +02:00
Bastian Kleineidam
176b95a30e
Do not strip quotes from resolved URLs.
2014-07-11 00:43:46 +02:00
Bastian Kleineidam
0fa7ed2699
Fix empty URL handling.
2014-07-03 23:34:40 +02:00
Bastian Kleineidam
cde261c009
Parse Refresh: and Content-Location: header values for URLs.
2014-07-01 20:16:43 +02:00
Bastian Kleineidam
d1ef9f7683
Improve output if update test fails.
2014-07-01 20:16:02 +02:00
Bastian Kleineidam
7e19740264
Remove unused variables.
2014-05-10 21:22:29 +02:00
Bastian Kleineidam
4b28e6e860
Move mime stuff into own submodule.
2014-05-10 21:22:10 +02:00
Bastian Kleineidam
b152ce7a6e
Add PDF test and fix page number.
2014-04-29 18:53:24 +02:00
Bastian Kleineidam
82dd76b0d7
Add PDF link parsing.
2014-04-28 18:13:45 +02:00
Bastian Kleineidam
981079c041
Support itemtype attribute parsing.
2014-04-23 22:03:20 +02:00
Bastian Kleineidam
7baa2f0b1b
Fix http_link check and add a basic auth check.
2014-04-10 18:06:15 +02:00
Bastian Kleineidam
4232b69633
Support <img> srcset attribute parsing.
2014-04-10 17:51:59 +02:00
Bastian Kleineidam
6caf654031
Parse Link: heaaders.
2014-04-10 17:50:55 +02:00
Bastian Kleineidam
b6b5c7a12e
Simpler link parsing routine.
2014-03-27 19:49:17 +01:00
Bastian Kleineidam
a8623bc0bc
Display SSL info on redirects.
2014-03-26 07:16:03 +01:00
Bastian Kleineidam
9cd67dfcb2
More SSL message work.
2014-03-20 20:24:57 +01:00
Bastian Kleineidam
9a7ad3a84f
Print SSL cipher info for https URLs.
2014-03-19 17:02:34 +01:00
Bastian Kleineidam
ce733ae76b
Don't check for robots.txt directives in local html files.
2014-03-19 16:33:22 +01:00
Bastian Kleineidam
9be667b52a
Do not warn about missing addresses on mailto links that have subjects.
2014-03-18 23:27:59 +01:00
Bastian Kleineidam
fc73c6ca6e
Log number of checked unique URLs.
2014-03-14 23:46:17 +01:00
Bastian Kleineidam
34bdf5c75a
Updated copyright and docs.
2014-03-14 22:09:05 +01:00
Bastian Kleineidam
c51caf1133
Assertions should be earlier.
2014-03-14 20:26:11 +01:00
Bastian Kleineidam
2d2e010940
Move some scripts into the new script dir.
2014-03-12 19:29:11 +01:00
Bastian Kleineidam
306979abca
Add HttpHeaderInfo plugin
2014-03-12 19:28:37 +01:00
Bastian Kleineidam
1733c6a6f2
Fix Travis CI build.
2014-03-11 19:56:36 +01:00
Bastian Kleineidam
bca226c293
Fix assertion checking external links; fix tests
2014-03-10 18:23:44 +01:00
Bastian Kleineidam
6b334dc79b
Fix URL result caching.
2014-03-08 19:35:10 +01:00
Bastian Kleineidam
fab2c2da98
Improve content type setting.
2014-03-05 20:12:19 +01:00
Bastian Kleineidam
ef13a3fce1
Implement sitemap and sitemap index parsing.
2014-03-05 19:26:37 +01:00
Bastian Kleineidam
b17211f162
Set for release.
2014-03-04 21:36:24 +01:00
Bastian Kleineidam
978b24f2d7
Merge branch 'caching'
2014-03-04 07:21:42 +01:00
Bastian Kleineidam
f1076c8813
Increase url-too-long warning.
2014-03-03 23:31:04 +01:00
Bastian Kleineidam
82f81241fd
Check all links and add better caching.
2014-03-03 23:29:45 +01:00
Bastian Kleineidam
cc21f8f3d2
Add missing import.
2014-03-02 20:01:55 +01:00
Bastian Kleineidam
b8175e2357
Disable news test.
2014-03-02 20:01:36 +01:00
Bastian Kleineidam
924c6285d2
Fix some tests
2014-03-02 07:45:04 +01:00