Commit graph

2754 commits

Author SHA1 Message Date
Bastian Kleineidam
c966fe6b24 Remove the http-wrong-redirect warning 2013-04-11 18:33:19 +02:00
Bastian Kleineidam
134db22830 Updated homepage URL. 2013-04-09 20:11:04 +02:00
Bastian Kleineidam
21678c661d Updated gzip and httplib copies. 2013-03-11 20:21:58 +01:00
Bastian Kleineidam
6b05f1d290 Paginate help output again. 2013-02-28 21:21:00 +01:00
Bastian Kleineidam
123578a4cd Make per-host connection limits configurable. 2013-02-27 19:37:28 +01:00
Bastian Kleineidam
b7c82d1e75 Fix strformat.strsize() test. 2013-02-27 19:36:03 +01:00
Bastian Kleineidam
b38317d57b Replace optparse with argparse. 2013-02-27 19:35:44 +01:00
Bastian Kleineidam
64d95e45e0 Remove local HTML and CSS syntax check. 2013-02-08 21:36:02 +01:00
Bastian Kleineidam
b104482174 Add missing docstring. 2013-01-25 21:15:12 +01:00
Bastian Kleineidam
35bc79dd90 Updated copyright. 2013-01-25 21:14:27 +01:00
Bastian Kleineidam
707b7b7db1 Close HTTP connections without body content. Github issue #376 2013-01-23 19:42:29 +01:00
Bastian Kleineidam
e6ad32c028 Catch UnicodeError for invalid host names. 2013-01-23 19:42:29 +01:00
Bastian Kleineidam
c0a0efbd1d Do not handle non-existing SIGUSR1 signal. 2013-01-22 21:23:46 +01:00
Bastian Kleineidam
47451d7def Fix GUI drag and drop. 2013-01-22 19:06:10 +01:00
Bastian Kleineidam
faa743e876 Increase per-host connection limits. 2013-01-22 18:18:48 +01:00
Bastian Kleineidam
fa402c0d70 Allow drag-and-drop of all local files. 2013-01-22 18:17:07 +01:00
Bastian Kleineidam
7134c0bb05 Print thread stack traces on SIGUSR1 2013-01-22 18:16:53 +01:00
Bastian Kleineidam
9b8cb67d78 Updated copyright. 2013-01-17 20:41:47 +01:00
Bastian Kleineidam
4dad2aa33c Support dns-prefetch URLs. 2013-01-17 20:41:09 +01:00
Bastian Kleineidam
7fe72745ae Updated copyright. 2013-01-09 23:03:12 +01:00
Bastian Kleineidam
fe7e9a5c6c Improve Word document opening: open read-only and invisble, avoiding unnecessary dialogs. 2013-01-07 22:18:39 +01:00
Bastian Kleineidam
a5b6136e70 Check word document validity before closing. 2013-01-07 21:58:02 +01:00
Bastian Kleineidam
0e50834f9a Rename external module to exclude it from some style checks. 2013-01-06 18:17:29 +01:00
Bastian Kleineidam
65a0031c10 Updated copyright. 2013-01-06 18:12:44 +01:00
Bastian Kleineidam
16b84be490 Updated all links. 2013-01-06 18:10:13 +01:00
Bastian Kleineidam
0283362ce6 Updated copyright. 2012-12-23 21:32:16 +01:00
Bastian Kleineidam
a7b83e6200 Fix GUI startup for Windows. 2012-12-19 21:12:02 +01:00
Bastian Kleineidam
9820530313 Use better_exchook to print more internal error info. 2012-12-18 23:06:48 +01:00
Bastian Kleineidam
f568a04a7c Fix ignore option storing in GUI. 2012-12-13 17:06:06 +01:00
Bastian Kleineidam
27df4e20da Add error handling for screen console function. 2012-12-07 22:31:48 +01:00
Bastian Kleineidam
efbbb656a1 Remove python-dns conflict by moving the dns module into a custom subdirectory. 2012-12-07 22:19:32 +01:00
Bastian Kleineidam
45a4bbdaa9 Use locale.format() and os.path.getsize() 2012-12-01 00:05:14 +01:00
Bastian Kleineidam
42a17cbb98 Prepare py3 port and display sys.argv on internal errors. 2012-11-26 18:49:07 +01:00
Bastian Kleineidam
ec03d56b62 Remove pysqlite dependency. 2012-11-14 20:23:56 +01:00
Bastian Kleineidam
7ae1eadadb Improve http status 305 code message. 2012-11-13 18:13:36 +01:00
Bastian Kleineidam
cd4abb1f12 Improve repr() of url data, and remove alexa test script. 2012-11-09 19:09:38 +01:00
Bastian Kleineidam
f3e52f1176 loginpasswordfield is not a password 2012-11-08 22:11:35 +01:00
Bastian Kleineidam
e5735e2a5d Fix URL queue handling. 2012-11-08 12:48:21 +01:00
Bastian Kleineidam
96c6a7f378 Display portable flag in about dialog. 2012-11-08 11:59:20 +01:00
Bastian Kleineidam
bc683577de Remove URLs from the in_progress cache. 2012-11-08 11:03:16 +01:00
Bastian Kleineidam
810a62e093 Fix file url checking. 2012-11-07 19:37:16 +01:00
Bastian Kleineidam
2d6cfb238f Add trailing dot when creating user configuration directory on Windows. 2012-11-07 18:22:07 +01:00
Bastian Kleineidam
b0c2a90b94 Updated copyright. 2012-11-07 18:08:44 +01:00
Bastian Kleineidam
f9a7f5ef96 Restrict local file checking. 2012-11-07 18:07:00 +01:00
Bastian Kleineidam
02ec94dbfb Improve cancel message. 2012-11-06 21:54:09 +01:00
Bastian Kleineidam
eabaa41bd2 Do not check duplicate URLs. 2012-11-06 21:34:22 +01:00
Bastian Kleineidam
ae5f9e8801 Print active threads in debug level. 2012-11-06 21:33:43 +01:00
Bastian Kleineidam
9745be9d71 Fix cookie path matching with empty paths. 2012-10-30 17:44:00 +01:00
Bastian Kleineidam
e2fd37b886 Encode user and password for telnet connection. 2012-10-30 17:44:00 +01:00
Bastian Kleineidam
c6d8b0050e Improve PHP command check. 2012-10-29 21:05:26 +01:00
Bastian Kleineidam
e8da486d66 Detect redirection errors when getting content. 2012-10-26 18:05:00 +02:00
Bastian Kleineidam
2390827735 Debug cookies. 2012-10-25 17:53:16 +02:00
Bastian Kleineidam
c44aa2db1f Fix anchor checking of cached HTTP URLs by using the cached content type. 2012-10-25 06:37:10 +02:00
Bastian Kleineidam
dca52145d3 Misc stuff. 2012-10-24 22:59:28 +02:00
Bastian Kleineidam
b39158e65c Improve available anchor message. 2012-10-24 22:21:46 +02:00
Bastian Kleineidam
dd2c963fac Fix non-ASCII exception handling. 2012-10-24 22:14:45 +02:00
Bastian Kleineidam
64de760b97 Added debug statements for unparseable content types. 2012-10-24 22:06:42 +02:00
Bastian Kleineidam
3a51ac7662 Warn about accessible passwords in config files. 2012-10-15 14:36:10 +02:00
Bastian Kleineidam
8750d55a73 Add configuration entry for maximum number of URLs. 2012-10-14 11:13:55 +02:00
Bastian Kleineidam
2ebedbaaa6 Fix content reading. 2012-10-13 16:48:29 +02:00
Bastian Kleineidam
0e4e694ad1 Fix connection handling on redirects. 2012-10-13 13:36:43 +02:00
Bastian Kleineidam
3b5877161c Improved debugging. 2012-10-13 13:36:28 +02:00
Bastian Kleineidam
d3b44be2c4 Improved documentation. 2012-10-13 12:03:19 +02:00
Bastian Kleineidam
7929a48d78 Fix url split with invalid port names. 2012-10-13 12:03:09 +02:00
Bastian Kleineidam
aa057bd36f Fix colorama init error. 2012-10-12 20:39:34 +02:00
Bastian Kleineidam
6a204120b6 Handle stale file system links for local file checks. 2012-10-12 17:20:19 +02:00
Bastian Kleineidam
c4e15c7b88 Improved duplication url check. 2012-10-10 21:04:48 +02:00
Bastian Kleineidam
b758fc6f52 Reuse existing response. 2012-10-10 12:27:36 +02:00
Bastian Kleineidam
a0610310b4 Print debug on stderr. 2012-10-10 12:27:25 +02:00
Bastian Kleineidam
0c20ef5de4 Strip console characters only from line text. 2012-10-10 12:27:08 +02:00
Bastian Kleineidam
e1e80b7dd5 Remove addrinfo cache. 2012-10-10 10:54:58 +02:00
Bastian Kleineidam
20be0f2519 Strip control chars from logger output. 2012-10-10 10:54:30 +02:00
Bastian Kleineidam
f484a6776d Use timeout value from configuration. 2012-10-10 10:53:52 +02:00
Bastian Kleineidam
871508ef5d Add docs and updated copyright. 2012-10-10 06:53:16 +02:00
Bastian Kleineidam
63cf8adf54 Catch ValueError on invalid cookie expiration dates. 2012-10-10 06:44:38 +02:00
Bastian Kleineidam
06a25676c5 Only read the maximum data size plus one, not the whole file. 2012-10-10 06:35:33 +02:00
Bastian Kleineidam
3e1d51b8bf Use RLock to simplify internal locking. 2012-10-09 21:11:35 +02:00
Bastian Kleineidam
c4cd66ea1b Simplify decorator duration check logic. 2012-10-09 21:05:24 +02:00
Bastian Kleineidam
03a5d476b3 Use URL name if title is empty. 2012-10-09 21:04:54 +02:00
Bastian Kleineidam
6d47b76509 Limit HTTP and FTP connections. Gets rid of spurious BadStatusLine errors. 2012-10-09 21:04:20 +02:00
Bastian Kleineidam
7d3ece502c Support semaphores. 2012-10-09 19:46:06 +02:00
Bastian Kleineidam
ad8525c483 Improve BadStatusline error message. 2012-10-05 08:32:24 +02:00
Bastian Kleineidam
d15fafb1f7 Code cleanup. 2012-10-05 08:10:44 +02:00
Bastian Kleineidam
5ebd754cdb Improved duplicate url check. 2012-10-01 16:11:45 +02:00
Bastian Kleineidam
ed7c60e491 Do not warn about duplicate URLs which can point to the same content. 2012-10-01 13:42:46 +02:00
Bastian Kleineidam
148846be67 Add flag to log lock contentions. 2012-10-01 13:32:30 +02:00
Bastian Kleineidam
b56c054932 Use finer-grained robots.txt locks to improve lock contention. 2012-10-01 13:29:29 +02:00
Bastian Kleineidam
27b61c3bfa Fix gzip handling in http content decoder. 2012-09-30 14:00:49 +02:00
Bastian Kleineidam
cbc3bcb0d3 Sitemap logger fixes. 2012-09-23 23:20:21 +02:00
Bastian Kleineidam
60305d8877 Code cleanup. 2012-09-23 21:20:12 +02:00
Bastian Kleineidam
e21187b275 Put in-progress URLs back near the front of URL queue, not at end. 2012-09-23 21:00:01 +02:00
Bastian Kleineidam
1f3034b5f5 Sitemap logger fixes. 2012-09-23 20:59:38 +02:00
Bastian Kleineidam
38dd63f055 Code cleanup. 2012-09-23 16:19:42 +02:00
Bastian Kleineidam
7f8fd01b22 Add Accept-Encoding and Accept-Charset headers. 2012-09-23 15:06:44 +02:00
Bastian Kleineidam
03ecff22bb Fix endless loop in http authentication. 2012-09-22 22:21:10 +02:00
Bastian Kleineidam
653b5f27dd Updated ignored schemes. 2012-09-22 16:18:37 +02:00
Bastian Kleineidam
1c59cb4d4c Use GET in case a HEAD method does not succeed, even if robots.txt content checkes denied the page. This way proper check results are achieved (but the content is still not checked, so it's ok). 2012-09-22 07:53:11 +02:00
Bastian Kleineidam
fba465e8e8 Fix robotstxt cache miss stats. 2012-09-21 21:12:28 +02:00
Bastian Kleineidam
f6b007f757 Fix useragent matching in robots.txt parser. 2012-09-21 21:12:13 +02:00
Bastian Kleineidam
bbf25106fa Fix double result setting on http checks. 2012-09-21 20:33:15 +02:00
Bastian Kleineidam
3e464e509c Do not allow empty configuration string values. 2012-09-21 16:05:34 +02:00
Bastian Kleineidam
ecf8753a19 Improved user-agent string similar to Google and Bing search bots. 2012-09-21 15:46:14 +02:00
Bastian Kleineidam
c274b50c50 Store lowercase URL scheme in checker class. 2012-09-21 14:35:25 +02:00
Bastian Kleineidam
0941f6ff02 Improve exception handling by using unicode. 2012-09-21 14:29:20 +02:00
Bastian Kleineidam
f46889a4af Log timestamps in debug output. 2012-09-21 13:05:36 +02:00
Bastian Kleineidam
049882e4fe Remove accept-encoding since some sites have wrong compression. 2012-09-20 22:39:15 +02:00
Bastian Kleineidam
7c6dce6136 Only warn non-empty site duplicates. 2012-09-20 20:39:36 +02:00
Bastian Kleineidam
a03090c20f Optimize intern/extern pattern parsing. 2012-09-20 20:19:13 +02:00
Bastian Kleineidam
c385c35b1a Fix ansicolor again. 2012-09-20 16:39:40 +02:00
Bastian Kleineidam
b9d234c78a Fix wrong method name in SSL certificate check. 2012-09-20 16:28:01 +02:00
Bastian Kleineidam
bff217c58b Never log ignored warnings. 2012-09-20 12:44:40 +02:00
Bastian Kleineidam
600b7c0e69 Fix duplicate content warning when self.size is not set yet. 2012-09-20 12:44:23 +02:00
Bastian Kleineidam
9cfee5eb5b Improved color detection with curses. 2012-09-20 12:13:15 +02:00
Bastian Kleineidam
bc0a17c1c4 Display last modified date in the GUI. 2012-09-19 21:23:39 +02:00
Bastian Kleineidam
d37347cab0 Remove unused variable. 2012-09-19 11:08:06 +02:00
Bastian Kleineidam
18a200d85f Fix tests. 2012-09-19 11:05:26 +02:00
Bastian Kleineidam
b8f8bdf5fc Fix last modified formatting. 2012-09-19 10:09:19 +02:00
Bastian Kleineidam
f5fbd7666f Remove unused import. 2012-09-19 09:39:32 +02:00
Bastian Kleineidam
75719b34f6 Updated copyright. 2012-09-19 09:17:25 +02:00
Bastian Kleineidam
71fba0f8b7 Log all valid URLs in sitemap loggers. 2012-09-19 09:17:08 +02:00
Bastian Kleineidam
9d1c90f96c Write extra script to analyse a memory dump. 2012-09-18 16:08:31 +02:00
Bastian Kleineidam
3a352631ba Add modified field to loggers. 2012-09-18 12:12:00 +02:00
Bastian Kleineidam
1db63227f6 Memoize file operations to minimize disk I/O. 2012-09-18 09:37:21 +02:00
Bastian Kleineidam
932a07a9cf Added XML sitemap logger. 2012-09-18 09:16:34 +02:00
Bastian Kleineidam
4e59056ee7 Warn about duplicate URL contents. 2012-09-17 19:49:50 +02:00
Bastian Kleineidam
02a09dbb28 Add documentation. 2012-09-17 16:30:32 +02:00
Bastian Kleineidam
99bf8aa940 Updated copyright. 2012-09-17 16:09:55 +02:00
Bastian Kleineidam
cb71f483a5 Warn about too long URLs. 2012-09-17 16:00:23 +02:00
Bastian Kleineidam
03667a4ec9 Print warning tags in text output. 2012-09-17 15:29:04 +02:00
Bastian Kleineidam
1f9ee987f9 Improved terminal color detection with curses. 2012-09-17 15:24:04 +02:00
Bastian Kleineidam
6e1841cf1f Print download and cache statistics. 2012-09-17 15:23:25 +02:00
Bastian Kleineidam
0b5b6ab37b Automatically set --complete for graph output. 2012-09-15 15:06:29 +02:00
Bastian Kleineidam
273230d98b Send HTTP Do-Not-Track header. 2012-09-14 22:41:38 +02:00
Bastian Kleineidam
e98f15933f Stop checking of all output loggers have been deactivated. 2012-09-14 22:36:59 +02:00
Bastian Kleineidam
81d2c4dbd9 Improved documentation. 2012-09-14 22:26:45 +02:00
Bastian Kleineidam
86f1c74006 Close loggers properly on I/O errors. 2012-09-14 22:09:18 +02:00
Bastian Kleineidam
6730fb51ee Allow maximum check time specification. 2012-09-03 20:17:49 +02:00
Bastian Kleineidam
a1dfaf2f91 Add missing docstring. 2012-09-02 23:37:43 +02:00
Bastian Kleineidam
21db38546c Updated copyright. 2012-09-02 23:36:31 +02:00
Bastian Kleineidam
3baaca47a0 Add maximum number of allowed puts on URL queue. 2012-09-02 22:44:29 +02:00
Bastian Kleineidam
d8fce1ceeb Do not sort URL queue anymore. 2012-09-02 22:32:14 +02:00
Bastian Kleineidam
7a6436f08f Increase checked cache in URL queue. 2012-09-02 22:21:49 +02:00
Bastian Kleineidam
4c16d3e702 Make 401 unauthorized GET response a warning. 2012-08-26 11:32:17 +02:00
Bastian Kleineidam
b6d45eabe5 Code cleanup. 2012-08-24 09:46:38 +02:00
Bastian Kleineidam
ac6591a009 Recognize WML files on Windows. 2012-08-24 09:46:26 +02:00
Bastian Kleineidam
7334a9863e Make URL properties in GUI selectable with the mouse. 2012-08-24 00:10:59 +02:00
Bastian Kleineidam
ae15d51b30 Translate more result strings. 2012-08-23 23:59:33 +02:00
Bastian Kleineidam
ce4253263c Do not special case http->ftp redirects. 2012-08-23 23:56:36 +02:00
Bastian Kleineidam
7374068941 Remove unused import. 2012-08-23 16:46:14 +02:00
Bastian Kleineidam
73d64e50ab Fix redirection to new scheme. 2012-08-23 16:45:24 +02:00