Commit graph

507 commits

Author SHA1 Message Date
Graham Seaman
233e7dcf68 Allow wayback-format urls without affecting atom 'feed' urls 2017-02-09 11:43:45 +00:00
Marius Gedminas
743a5f31cb Crawl HTML attributes in deterministic order
Fixes #17.
2017-02-01 19:19:53 +02:00
Marius Gedminas
a825b9d901 Mark the non-deterministic test as xfail 2017-02-01 18:57:40 +02:00
Marius Gedminas
02869ea076 Mark TestFile.test_directory_listing as known to fail
The test unzipps a zip file with a weird-looking non-ASCII filename in it.
I don't think zip files specify the encoding for filenames.  Different
unzip utilities may interpret the filename differently.  Plus, the byte
representation of the unzipped filename may be different depending on
the filesystem charset.

To me it looks as if the filename is garbage encoded as valid UTF-8, and
the test expectation is to get it in latin-1 or something.
2017-02-01 18:45:05 +02:00
Marius Gedminas
cffea5fcbd Mark TestHttps.test_https as known to fail
This test depends on the way http://amazon.com/ works.  I don't think
that's a good idea.
2017-02-01 18:44:21 +02:00
Marius Gedminas
f4ec7531c1 Fix TestHttp.test_html
The HTML tag has two attributes with URLs:

  <applet archive="file.html" src="file.css">

It would appear that the order in which these attributes are crawled
does not match the order in the result file.

Possibly the crawling order is non-deterministic, although I cannot
reproduce that.  If that's the case, the fix would be to sort the
attributes in the crawler before following them, which means we want the
expected results sorted as well (and since 'archive' comes before 'src',
so file.html should come before file.css).
2017-02-01 18:41:47 +02:00
Bastian Kleineidam
0ef00eea56 Move GUI files to separate project 2016-01-23 13:28:15 +01:00
Bastian Kleineidam
e410169fd0 Remove unused test module 2016-01-20 20:18:09 +01:00
Bastian Kleineidam
88c060699d Fix tests 2016-01-19 22:05:15 +01:00
Bastian Kleineidam
914995b5fc Use example.com for tests. 2016-01-19 12:17:08 +01:00
Vadim Khohlov
d4352fc828 Added plugin for parsing and checking links in Markdown files 2014-11-11 15:35:18 +02:00
Bastian Kleineidam
7239cd1b76 Add test for itms-services URL. 2014-09-05 21:37:33 +02:00
Bastian Kleineidam
4e1e756ca4 Updated copyright. 2014-07-16 07:35:32 +02:00
Bastian Kleineidam
e24ba214da Fix config test. 2014-07-15 22:31:19 +02:00
Bastian Kleineidam
032c4091c3 Some easy python3 compatibility changes. 2014-07-15 18:40:47 +02:00
Bastian Kleineidam
176b95a30e Do not strip quotes from resolved URLs. 2014-07-11 00:43:46 +02:00
Bastian Kleineidam
0fa7ed2699 Fix empty URL handling. 2014-07-03 23:34:40 +02:00
Bastian Kleineidam
cde261c009 Parse Refresh: and Content-Location: header values for URLs. 2014-07-01 20:16:43 +02:00
Bastian Kleineidam
d1ef9f7683 Improve output if update test fails. 2014-07-01 20:16:02 +02:00
Bastian Kleineidam
7e19740264 Remove unused variables. 2014-05-10 21:22:29 +02:00
Bastian Kleineidam
4b28e6e860 Move mime stuff into own submodule. 2014-05-10 21:22:10 +02:00
Bastian Kleineidam
b152ce7a6e Add PDF test and fix page number. 2014-04-29 18:53:24 +02:00
Bastian Kleineidam
82dd76b0d7 Add PDF link parsing. 2014-04-28 18:13:45 +02:00
Bastian Kleineidam
981079c041 Support itemtype attribute parsing. 2014-04-23 22:03:20 +02:00
Bastian Kleineidam
7baa2f0b1b Fix http_link check and add a basic auth check. 2014-04-10 18:06:15 +02:00
Bastian Kleineidam
4232b69633 Support <img> srcset attribute parsing. 2014-04-10 17:51:59 +02:00
Bastian Kleineidam
6caf654031 Parse Link: heaaders. 2014-04-10 17:50:55 +02:00
Bastian Kleineidam
b6b5c7a12e Simpler link parsing routine. 2014-03-27 19:49:17 +01:00
Bastian Kleineidam
a8623bc0bc Display SSL info on redirects. 2014-03-26 07:16:03 +01:00
Bastian Kleineidam
9cd67dfcb2 More SSL message work. 2014-03-20 20:24:57 +01:00
Bastian Kleineidam
9a7ad3a84f Print SSL cipher info for https URLs. 2014-03-19 17:02:34 +01:00
Bastian Kleineidam
ce733ae76b Don't check for robots.txt directives in local html files. 2014-03-19 16:33:22 +01:00
Bastian Kleineidam
9be667b52a Do not warn about missing addresses on mailto links that have subjects. 2014-03-18 23:27:59 +01:00
Bastian Kleineidam
fc73c6ca6e Log number of checked unique URLs. 2014-03-14 23:46:17 +01:00
Bastian Kleineidam
34bdf5c75a Updated copyright and docs. 2014-03-14 22:09:05 +01:00
Bastian Kleineidam
c51caf1133 Assertions should be earlier. 2014-03-14 20:26:11 +01:00
Bastian Kleineidam
2d2e010940 Move some scripts into the new script dir. 2014-03-12 19:29:11 +01:00
Bastian Kleineidam
306979abca Add HttpHeaderInfo plugin 2014-03-12 19:28:37 +01:00
Bastian Kleineidam
1733c6a6f2 Fix Travis CI build. 2014-03-11 19:56:36 +01:00
Bastian Kleineidam
bca226c293 Fix assertion checking external links; fix tests 2014-03-10 18:23:44 +01:00
Bastian Kleineidam
6b334dc79b Fix URL result caching. 2014-03-08 19:35:10 +01:00
Bastian Kleineidam
fab2c2da98 Improve content type setting. 2014-03-05 20:12:19 +01:00
Bastian Kleineidam
ef13a3fce1 Implement sitemap and sitemap index parsing. 2014-03-05 19:26:37 +01:00
Bastian Kleineidam
b17211f162 Set for release. 2014-03-04 21:36:24 +01:00
Bastian Kleineidam
978b24f2d7 Merge branch 'caching' 2014-03-04 07:21:42 +01:00
Bastian Kleineidam
f1076c8813 Increase url-too-long warning. 2014-03-03 23:31:04 +01:00
Bastian Kleineidam
82f81241fd Check all links and add better caching. 2014-03-03 23:29:45 +01:00
Bastian Kleineidam
cc21f8f3d2 Add missing import. 2014-03-02 20:01:55 +01:00
Bastian Kleineidam
b8175e2357 Disable news test. 2014-03-02 20:01:36 +01:00
Bastian Kleineidam
924c6285d2 Fix some tests 2014-03-02 07:45:04 +01:00
Bastian Kleineidam
98c8163179 Remove old test 2014-03-01 21:16:38 +01:00
Bastian Kleineidam
6f205a2574 Support checking Sitemap: URLs in robots.txt files. 2014-03-01 20:25:19 +01:00
Bastian Kleineidam
0e4d6f6e1a Parse sitemap urls in robots.txt files. 2014-03-01 19:57:57 +01:00
Bastian Kleineidam
7b34be590b Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
Bastian Kleineidam
c806be5c15 Updated copyright 2014-01-08 22:33:04 +01:00
Bastian Kleineidam
e0a2558b2b Updated copyright. 2013-12-24 07:13:16 +01:00
Bastian Kleineidam
103e00b4d1 Allow disabling of ssl certificate checks. 2013-12-12 22:17:57 +01:00
Bastian Kleineidam
5736987b60 Refactor output loggers. 2013-12-11 18:41:55 +01:00
Bastian Kleineidam
78ed1e9e52 Do not GET on POST forms. 2013-12-10 23:42:43 +01:00
Bastian Kleineidam
b567f766ba Fix strtime test. 2013-12-06 07:13:44 +01:00
Bastian Kleineidam
b363945052 Adjust example.com/org tests. This seems to change every now and then. 2013-12-04 19:13:18 +01:00
Bastian Kleineidam
023da7c993 Remove the duplicate URL content check. 2013-12-04 19:12:40 +01:00
Bastian Kleineidam
84dac60f57 Fix network test 2013-12-04 19:05:20 +01:00
Bastian Kleineidam
36badddfac Update cookie code from Python module. 2013-12-04 19:05:08 +01:00
Bastian Kleineidam
c966fe6b24 Remove the http-wrong-redirect warning 2013-04-11 18:33:19 +02:00
Bastian Kleineidam
b7c82d1e75 Fix strformat.strsize() test. 2013-02-27 19:36:03 +01:00
Bastian Kleineidam
35bc79dd90 Updated copyright. 2013-01-25 21:14:27 +01:00
Bastian Kleineidam
a86e36e5d3 Fix test cases for example.com redirection. 2013-01-23 19:42:29 +01:00
Bastian Kleineidam
e6ad32c028 Catch UnicodeError for invalid host names. 2013-01-23 19:42:29 +01:00
Bastian Kleineidam
4dad2aa33c Support dns-prefetch URLs. 2013-01-17 20:41:09 +01:00
Bastian Kleineidam
03f2e19cfd Fix html tests. 2013-01-17 20:40:51 +01:00
Bastian Kleineidam
7fe72745ae Updated copyright. 2013-01-09 23:03:12 +01:00
Bastian Kleineidam
aaf35c0f4a Added Word test. 2013-01-09 23:02:47 +01:00
Bastian Kleineidam
e91c2edf7e Test all http response codes. 2012-11-13 18:11:25 +01:00
Bastian Kleineidam
cd4abb1f12 Improve repr() of url data, and remove alexa test script. 2012-11-09 19:09:38 +01:00
Bastian Kleineidam
7bd58af106 Updated copyright. 2012-11-07 18:07:00 +01:00
Bastian Kleineidam
faa052fa99 Fix test case. 2012-11-07 18:07:00 +01:00
Bastian Kleineidam
f9a7f5ef96 Restrict local file checking. 2012-11-07 18:07:00 +01:00
Bastian Kleineidam
eabaa41bd2 Do not check duplicate URLs. 2012-11-06 21:34:22 +01:00
Bastian Kleineidam
bc6cf5de34 Start local telnet server for tests. 2012-10-30 17:44:00 +01:00
Bastian Kleineidam
e594ca3c39 Improved documentation. 2012-10-30 17:44:00 +01:00
Bastian Kleineidam
5f2e6730a9 Allow python interpreter specification for test run. 2012-10-26 18:05:00 +02:00
Bastian Kleineidam
a77a5dddfd Fix sporadic test failures with a dummy directory listing. 2012-10-15 14:36:27 +02:00
Bastian Kleineidam
7929a48d78 Fix url split with invalid port names. 2012-10-13 12:03:09 +02:00
Bastian Kleineidam
57c2fd7b22 Add alexa run debug flag. 2012-10-10 21:05:07 +02:00
Bastian Kleineidam
c4e15c7b88 Improved duplication url check. 2012-10-10 21:04:48 +02:00
Bastian Kleineidam
aa2960e889 Fix content check. 2012-10-10 12:26:33 +02:00
Bastian Kleineidam
973fb6f565 Strip whitespace. 2012-10-10 12:26:19 +02:00
Bastian Kleineidam
e1e80b7dd5 Remove addrinfo cache. 2012-10-10 10:54:58 +02:00
Bastian Kleineidam
20be0f2519 Strip control chars from logger output. 2012-10-10 10:54:30 +02:00
Bastian Kleineidam
871508ef5d Add docs and updated copyright. 2012-10-10 06:53:16 +02:00
Bastian Kleineidam
63cf8adf54 Catch ValueError on invalid cookie expiration dates. 2012-10-10 06:44:38 +02:00
Bastian Kleineidam
a86a7332f3 Add debug output for alexa test run. 2012-10-09 19:47:46 +02:00
Bastian Kleineidam
81ca9a08d4 Fix typos. 2012-10-04 19:49:54 +02:00
Bastian Kleineidam
6f6608525e Split mail tests. 2012-10-01 20:11:59 +02:00
Bastian Kleineidam
6f5e55fd3b Code cleanup. 2012-10-01 10:43:20 +02:00
Bastian Kleineidam
1b3b040be5 Fix check result order. 2012-10-01 10:28:42 +02:00
Bastian Kleineidam
5a12ccf8d0 Fix anchor test result ordering. 2012-09-30 22:02:29 +02:00
Bastian Kleineidam
3c44056fde Code cleanup 2012-09-30 14:01:09 +02:00
Bastian Kleineidam
169bdecb69 Fix clamav test. 2012-09-30 12:00:44 +02:00
Bastian Kleineidam
39204ea0fe Use py.test skip function instead of nose. 2012-09-29 20:28:16 +02:00
Bastian Kleineidam
2479c53e6c Use a free port number in ftp tests for local server. 2012-09-29 19:22:12 +02:00
Bastian Kleineidam
a022c836bc Randomize site test. 2012-09-23 16:19:56 +02:00
Bastian Kleineidam
cff97b9718 Updated copyright. 2012-09-21 21:13:00 +02:00
Bastian Kleineidam
f6b007f757 Fix useragent matching in robots.txt parser. 2012-09-21 21:12:13 +02:00
Bastian Kleineidam
1c2a66ffaf Refactor http tests into multiple files. 2012-09-21 20:34:05 +02:00
Bastian Kleineidam
498567eb21 Remove alexa log files on distclean. 2012-09-21 16:04:46 +02:00
Bastian Kleineidam
32357c9683 Improve alexa test script. 2012-09-21 15:53:16 +02:00
Bastian Kleineidam
718c033989 Add alexa test run script. 2012-09-21 14:50:33 +02:00
Bastian Kleineidam
a03090c20f Optimize intern/extern pattern parsing. 2012-09-20 20:19:13 +02:00
Bastian Kleineidam
9d1c90f96c Write extra script to analyse a memory dump. 2012-09-18 16:08:31 +02:00
Bastian Kleineidam
aee515d406 Fix tests. 2012-09-18 09:17:08 +02:00
Bastian Kleineidam
58cbe4b152 Updated copyright. 2012-09-17 21:03:52 +02:00
Bastian Kleineidam
4e59056ee7 Warn about duplicate URL contents. 2012-09-17 19:49:50 +02:00
Bastian Kleineidam
99bf8aa940 Updated copyright. 2012-09-17 16:09:55 +02:00
Bastian Kleineidam
c3a6603987 Replace URL with example.org. 2012-09-17 16:07:06 +02:00
Bastian Kleineidam
cb71f483a5 Warn about too long URLs. 2012-09-17 16:00:23 +02:00
Bastian Kleineidam
93f3683ac1 Fix tests. 2012-09-02 23:22:01 +02:00
Bastian Kleineidam
07f7be5cf3 Remove slashdot from tests. 2012-08-23 23:56:50 +02:00
Bastian Kleineidam
14326710bc Updated copyright. 2012-08-23 16:46:06 +02:00
Bastian Kleineidam
e252bbf623 Remove Amazon quirk because the default behaviour handles this now. 2012-08-23 05:36:51 +02:00
Bastian Kleineidam
ecef16b2c9 Support WML sites. 2012-08-22 22:43:14 +02:00
Bastian Kleineidam
76f57dc4ad Updated copyright. 2012-08-14 20:37:24 +02:00
Bastian Kleineidam
564ae6479f Fix tests. 2012-08-13 18:01:59 +02:00
Bastian Kleineidam
fe82c380c3 Updated test data. 2012-08-12 11:15:36 +02:00
Bastian Kleineidam
1d7e93fe62 Updated http check data. 2012-08-12 10:55:25 +02:00
Bastian Kleineidam
b0e5c7fc59 Ignore feed: URLs. 2012-06-27 21:32:03 +02:00
Bastian Kleineidam
0fd1a78378 Always compare encoded anchor names. 2012-06-27 20:59:53 +02:00
Bastian Kleineidam
3934e63994 Fix url tests. 2012-06-24 21:48:16 +02:00
Bastian Kleineidam
b550a9dcb5 Updated copyright. 2012-06-23 14:31:11 +02:00
Bastian Kleineidam
31519f6a01 Fix handling of UNC pathnames. 2012-06-23 14:30:58 +02:00
Bastian Kleineidam
e7dd1d421b Fix urljoin test case. 2012-06-23 14:29:26 +02:00
Bastian Kleineidam
363ccc0121 Check <object codebase=...> as normal URL. 2012-06-23 14:28:32 +02:00
Bastian Kleineidam
0c71061f7d Add windows skip helper function. 2012-06-23 13:32:38 +02:00
Bastian Kleineidam
3dd35c57a8 Rename wrong module name. 2012-06-20 21:43:25 +02:00
Bastian Kleineidam
dcf886860f Fix binary safari bookmark test. 2012-06-20 21:40:14 +02:00
Bastian Kleineidam
4cce99a77d Test SSL certificate expiration. 2012-06-20 20:10:40 +02:00
Bastian Kleineidam
f107092a8a Fix handling of user/password info in URLs. 2012-06-10 22:07:42 +02:00
Bastian Kleineidam
838095cbd5 Updated copyright. 2012-06-10 14:58:38 +02:00
Bastian Kleineidam
d323971c5f Add memory debug configuration test. 2012-06-10 14:31:51 +02:00
Bastian Kleineidam
db95fce77e Ignore PHP processing instructions in local files. 2012-06-10 14:02:01 +02:00
Bastian Kleineidam
c3ffb0530f Fix robots.txt tests. 2012-06-10 13:19:30 +02:00
Bastian Kleineidam
7bb5dac321 Updated copyright 2012-04-23 21:33:59 +02:00
Bastian Kleineidam
67db3b5cd6 Improved check for test functions. 2012-04-23 20:58:55 +02:00
Bastian Kleineidam
cd6ee8a1bc Fix checker test cases for non-english locales. 2012-04-23 20:56:33 +02:00
Bastian Kleineidam
4cc19f4e9c Fix strformat tests on non-english locales. 2012-04-23 20:35:32 +02:00
Bastian Kleineidam
1cc3ac5f80 Updated copyright. 2012-04-22 20:45:24 +02:00
Bastian Kleineidam
0bd3402471 Fix CGI tests for changed function args. 2012-04-22 17:45:47 +02:00
Bastian Kleineidam
c2221e1a18 Ensure proper encoding in WSGI script. 2012-04-22 12:48:21 +02:00
Bastian Kleineidam
1ef9a022ca Make WSGI script more responsive by using threads. 2012-04-18 21:52:36 +02:00
Bastian Kleineidam
3d831c1adb Updated copyright. 2012-04-11 22:23:43 +02:00
Bastian Kleineidam
1a28c2e334 Detect invalid empty cookies. 2012-04-03 08:03:54 +02:00
Bastian Kleineidam
6d5e5f9efb Updated copyright. 2012-03-30 22:24:10 +02:00
Bastian Kleineidam
9ee9abcf0f Parse invalid comments <! bla > 2012-03-23 07:41:03 +01:00
Bastian Kleineidam
d6d82b96f1 Add cookiefile config option. 2012-03-22 22:29:56 +01:00
Bastian Kleineidam
4c9fd8d488 Cache real url. 2012-03-14 21:12:13 +01:00
Bastian Kleineidam
187a94312b Updated copyright. 2012-03-09 11:16:18 +01:00
Bastian Kleineidam
626bd3e249 Fix obfuscated IP address check. 2012-03-09 10:08:04 +01:00
Bastian Kleineidam
71f5ee42c8 Updated copyright. 2012-01-29 17:18:28 +01:00
Bastian Kleineidam
a0581cc2a1 Ignore steam:// URIs. 2012-01-10 19:37:19 +01:00
Bastian Kleineidam
f1eb51d885 Updated copyright 2012-01-06 09:21:30 +01:00
Bastian Kleineidam
6409651f55 Remove unused function. 2012-01-04 20:04:14 +01:00
Bastian Kleineidam
3c5b21c0c4 Do not use XXX tag here. 2012-01-04 18:57:56 +01:00
Bastian Kleineidam
fb979b4f3c Add test for archive attribute support. 2011-12-30 12:36:22 +01:00
Bastian Kleineidam
4a04ff3224 Add html5 tests. 2011-12-30 12:30:07 +01:00
Bastian Kleineidam
8848e34d20 Add missing copyright. 2011-12-25 11:17:48 +01:00
Bastian Kleineidam
dff425710d More Freshmeat/Freecode replacements. 2011-12-25 09:06:18 +01:00
Bastian Kleineidam
1c26c14b64 Set copyright year and add missing docstrings. 2011-12-25 08:45:27 +01:00
Bastian Kleineidam
9c40078ab9 Run obfuscated IP and unicode directory listing test only on linux. 2011-12-18 08:12:23 +01:00
Bastian Kleineidam
0e29ebde2a Run obfuscated IP test only on linux. 2011-12-18 08:10:33 +01:00
Bastian Kleineidam
2aea1ef1e5 Fix localhost tests. 2011-12-17 20:31:47 +01:00
Bastian Kleineidam
ef8d9b629f Use dynamic port number in results. 2011-12-17 19:13:14 +01:00
Bastian Kleineidam
f8ef9cca6a Test cleanup. 2011-12-17 16:39:21 +01:00
Bastian Kleineidam
3d9958dfbb Parse Safari bookmark files. 2011-12-17 16:38:25 +01:00
Bastian Kleineidam
1b5cad3b3e Only remove whitespace at start and end of URL. 2011-12-10 11:49:44 +01:00
Bastian Kleineidam
5576dfe10d Fix LFUCache: delete 5% of least frequently used entries on shrink, not 95%. 2011-12-08 22:10:53 +01:00
Bastian Kleineidam
9956f3712e Properly detect too-long Unicode hostnames. 2011-12-05 20:51:42 +01:00
Bastian Kleineidam
74ea444a9a Parse logger and logging part names case insensitive. 2011-10-10 20:32:58 +02:00
Bastian Kleineidam
d2ae6bf71c Properly detect HTML character encoding. 2011-08-14 12:49:31 +02:00
Bastian Kleineidam
09d9264470 Updated copyright. 2011-08-04 20:40:49 +02:00
Bastian Kleineidam
cdf91a0321 Improve cookie info message and fix cookie test cases. 2011-08-04 18:34:56 +02:00
Bastian Kleineidam
977d9e9ae6 Update cookie values instead of adding duplicate entries. 2011-08-01 20:26:31 +02:00
Bastian Kleineidam
210b45d1e4 Removed FastCGI script. 2011-07-25 21:37:35 +02:00
Bastian Kleineidam
51bcccfdfe Added new option --user-agent to set the User-Agent header. 2011-07-25 21:09:49 +02:00
Bastian Kleineidam
2550e16040 Remove query part from file links. 2011-05-29 17:49:01 +02:00
Bastian Kleineidam
430e1db78d Do not use fixed port for HTTP server. 2011-05-28 19:24:38 +02:00
Bastian Kleineidam
c9707ee735 Handle stray < before end tags. 2011-05-28 13:39:04 +02:00
Bastian Kleineidam
0f70438a87 Updated copyright. 2011-05-28 08:44:21 +02:00
Bastian Kleineidam
e1f724908d Move dnspython module into third_party directory. 2011-05-24 20:18:58 +02:00
Bastian Kleineidam
ce88b073d7 Added debug script for HTML parsing. 2011-05-20 19:22:42 +02:00
Bastian Kleineidam
7d04c3ee81 Handle stray < characters in HTML. 2011-05-20 06:50:08 +02:00
Bastian Kleineidam
2c8c59ddb0 Remove or rename unused variables. 2011-05-18 21:02:59 +02:00
Bastian Kleineidam
74c132c90b Updated copyright. 2011-04-26 14:57:57 +02:00
Bastian Kleineidam
498e5cc786 Test obfuscated IPs only on Linux. 2011-04-26 11:54:04 +02:00
Bastian Kleineidam
6a544f2d69 Only allow redirections to FTP, HTTP and HTTPS URLs. 2011-04-19 07:01:55 +02:00
Bastian Kleineidam
7365170564 Updated copyright. 2011-04-12 09:13:39 +02:00
Bastian Kleineidam
1a31d59df9 Added url.get_content() test. 2011-04-10 10:57:07 +02:00
Bastian Kleineidam
40cad4e468 Add detailed locale info to internal error output. 2011-04-06 11:56:55 +02:00
Bastian Kleineidam
82e5ba8ce6 Add warning tag attribute in XML loggers. 2011-03-15 13:42:21 +01:00
Bastian Kleineidam
f4f921384e Updated copyright 2011-03-13 07:52:18 +01:00