Bastian Kleineidam
|
f107092a8a
|
Fix handling of user/password info in URLs.
|
2012-06-10 22:07:42 +02:00 |
|
Bastian Kleineidam
|
98537eea2f
|
Code cleanup: use add_url() function in UrlBase.
|
2012-06-10 14:24:17 +02:00 |
|
Bastian Kleineidam
|
db95fce77e
|
Ignore PHP processing instructions in local files.
|
2012-06-10 14:02:01 +02:00 |
|
Bastian Kleineidam
|
837ab22d01
|
Syntax cleanup.
|
2012-06-10 11:46:05 +02:00 |
|
Bastian Kleineidam
|
77b8ec0fcd
|
Fix writing temporary Word files.
|
2012-06-10 11:07:35 +02:00 |
|
Bastian Kleineidam
|
52dcf101e0
|
Remove rest of deprecated options.
|
2012-04-22 17:55:12 +02:00 |
|
Bastian Kleineidam
|
b9b8e3f5b2
|
Honor the charset encoding of the Content-Type HTTP
header when parsing HTML.
|
2012-03-22 22:45:11 +01:00 |
|
Bastian Kleineidam
|
4c9fd8d488
|
Cache real url.
|
2012-03-14 21:12:13 +01:00 |
|
Bastian Kleineidam
|
042b0569ec
|
Fall back to W3C checkers.
|
2012-01-22 08:13:27 +01:00 |
|
Bastian Kleineidam
|
51cf55b7a6
|
Remove warning: prefix from warning messages.
|
2012-01-21 00:25:02 +01:00 |
|
Bastian Kleineidam
|
f1eb51d885
|
Updated copyright
|
2012-01-06 09:21:30 +01:00 |
|
Bastian Kleineidam
|
033280cfb9
|
Remove workarounds for old Python versions.
|
2012-01-04 20:17:53 +01:00 |
|
Bastian Kleineidam
|
3d9958dfbb
|
Parse Safari bookmark files.
|
2011-12-17 16:38:25 +01:00 |
|
Bastian Kleineidam
|
27b7b1cb49
|
Fix W3C HTML validation.
|
2011-10-09 21:16:45 +02:00 |
|
Bastian Kleineidam
|
89ec0ee6a1
|
Check multiple matches of warning regex.
|
2011-10-09 19:00:35 +02:00 |
|
Bastian Kleineidam
|
72b65d94df
|
Only check anchors in HTML pages.
|
2011-05-22 17:33:16 +02:00 |
|
Bastian Kleineidam
|
e5c2271533
|
Only check warning patterns in parseable contents.
|
2011-05-22 17:32:26 +02:00 |
|
Bastian Kleineidam
|
68ea03ee16
|
Support both Chromium and Google Chrome profile dirs to find bookmark files.
|
2011-05-21 11:47:54 +02:00 |
|
Bastian Kleineidam
|
78790d7c8d
|
Improved anchor warning message display.
|
2011-05-20 06:48:06 +02:00 |
|
Bastian Kleineidam
|
343cf9703d
|
Code cleanup: indentation, unused variables etc.
|
2011-05-15 18:36:30 +02:00 |
|
Bastian Kleineidam
|
10bbb696e8
|
Limit download file size to 5MB.
|
2011-05-05 21:10:55 +02:00 |
|
Bastian Kleineidam
|
719441cca5
|
Make module detection more robust and use it when possible.
|
2011-04-20 09:08:11 +02:00 |
|
Bastian Kleineidam
|
84f6d56a49
|
Print level in loggers xml, csv and sql.
|
2011-04-09 10:51:03 +02:00 |
|
Bastian Kleineidam
|
c0732e3d37
|
Do not print empty country information.
|
2011-04-06 17:22:48 +02:00 |
|
Bastian Kleineidam
|
82e5ba8ce6
|
Add warning tag attribute in XML loggers.
|
2011-03-15 13:42:21 +01:00 |
|
Bastian Kleineidam
|
7b33cfac7b
|
Use stripped URL base constructing absolute URL.
|
2011-03-11 15:17:36 +01:00 |
|
Bastian Kleineidam
|
420c21c2de
|
Strip leading and trailing whitespace from URLs.
|
2011-03-07 12:33:09 +01:00 |
|
Bastian Kleineidam
|
21e4824f65
|
Fix typo calling get_temp_file() function.
|
2011-03-07 09:57:40 +01:00 |
|
Bastian Kleineidam
|
0d4377d1ba
|
Support Google Chrome Bookmark files.
|
2011-02-15 18:26:00 +01:00 |
|
Bastian Kleineidam
|
25b6dc2e57
|
Refactor bookmark parsing code into own package.
|
2011-02-15 17:31:42 +01:00 |
|
Bastian Kleineidam
|
c5884b8d87
|
Add function documentation.
|
2011-02-14 21:06:34 +01:00 |
|
Bastian Kleineidam
|
4a0c63aa56
|
Fix joining of URLs when parent URL has CGI parameter.
|
2011-02-08 21:25:55 +01:00 |
|
Bastian Kleineidam
|
71b15b70f4
|
Updated copyright
|
2011-01-06 09:59:57 +01:00 |
|
Bastian Kleineidam
|
5f70b7210f
|
Add tempfile utility function.
|
2011-01-06 09:52:11 +01:00 |
|
Bastian Kleineidam
|
d011d1524c
|
Parse PHP files recursively.
|
2010-12-28 17:11:29 +01:00 |
|
Bastian Kleineidam
|
fd3fe8dcaa
|
Fix missing content types for cached URLs.
|
2010-12-23 07:37:36 +01:00 |
|
Bastian Kleineidam
|
6090e1a66c
|
Print anchor in __str__()
|
2010-12-21 20:55:49 +01:00 |
|
Bastian Kleineidam
|
7c08290c44
|
Fix broken anchor checking.
|
2010-12-20 19:55:26 +01:00 |
|
Bastian Kleineidam
|
224061e284
|
Fix to_wire by looking of URL parts have been initialized.
|
2010-12-15 13:24:12 +01:00 |
|
Bastian Kleineidam
|
2b2121b9ed
|
Added content type and domain to URL logging info.
|
2010-12-14 20:30:53 +01:00 |
|
Bastian Kleineidam
|
01184784ef
|
Remove warning about Unicode domains which are more widely supported now.
|
2010-12-11 07:58:15 +01:00 |
|
Bastian Kleineidam
|
f14340a0a8
|
Do not check content of already cached URLs.
|
2010-10-27 19:52:48 +02:00 |
|
Bastian Kleineidam
|
d9e981e497
|
Don't log a warning if commandline URL has been redirected.
|
2010-10-26 16:24:27 +02:00 |
|
Bastian Kleineidam
|
4375d35328
|
Add warning about unsupported HTTP authentication, and revert the realm changes.
|
2010-10-25 22:41:31 +02:00 |
|
Bastian Kleineidam
|
332fa4f8f9
|
Prepare multi-realm auth configuration.
|
2010-10-25 22:07:16 +02:00 |
|
Bastian Kleineidam
|
a8aa3bdb00
|
Another fix to ensure get_content() is only called when allowed.
|
2010-10-13 22:14:43 +02:00 |
|
Bastian Kleineidam
|
61e611e4bf
|
Prevent unallowed content read when checking for robots.txt allowance in HTML files.
|
2010-10-12 00:40:34 +02:00 |
|
Bastian Kleineidam
|
1d0db02192
|
Refactor getting user and password for an URL.
|
2010-10-11 20:11:15 +02:00 |
|
Bastian Kleineidam
|
e494d6bbb6
|
Move MIME type detection into fileutil.py module, and use mimetools for detection.
|
2010-10-03 08:47:48 +02:00 |
|
Bastian Kleineidam
|
840538d12a
|
Remove uneeded check for HTML content.
|
2010-09-29 19:25:14 +02:00 |
|
Bastian Kleineidam
|
279a1eae70
|
Only add geoip info for non-empty hostnames.
|
2010-09-29 15:59:57 +02:00 |
|
Bastian Kleineidam
|
cc848cdb33
|
Fix import for moved geoip module.
|
2010-09-29 15:17:27 +02:00 |
|
Bastian Kleineidam
|
8a1ac26c85
|
Warn about obfuscated IP numbers.
|
2010-09-05 20:11:02 +02:00 |
|
Bastian Kleineidam
|
8a074aeea9
|
Work around Python 2.6+ urljoin bug.
|
2010-08-31 09:16:24 +02:00 |
|
Bastian Kleineidam
|
c3b8ff00b3
|
Check content and recursion in one try/except to avoid multiple errors when getting page content.
|
2010-08-31 06:52:08 +02:00 |
|
Bastian Kleineidam
|
1faedafb33
|
Fix data size for HTTP requests.
|
2010-08-04 00:06:25 +02:00 |
|
Bastian Kleineidam
|
0f92b76290
|
Remove the unnormed URL warning.
|
2010-07-29 20:20:59 +02:00 |
|
Bastian Kleineidam
|
7ad4f7c220
|
Compare size from meta info and content data.
|
2010-07-29 19:53:41 +02:00 |
|
Bastian Kleineidam
|
d9bfd25a68
|
Add warning if content size is zero
|
2010-07-28 08:19:55 +02:00 |
|
Bluebird75
|
28f4514b67
|
Use object with __slots__ for wire-format of UrlBase objects.
Saves memory since UrlBase wire-format objects are used for
logging and thus often created.
Signed-off-by: Bastian Kleineidam <calvin@debian.org>
|
2010-03-27 00:07:19 +01:00 |
|
Bastian Kleineidam
|
3370ea1562
|
Reflect changes in httplib2.py: use buffered read in httplib response object and use bad status line exception attribute.
|
2010-03-26 20:50:38 +01:00 |
|
Bastian Kleineidam
|
37b4e97012
|
Revert "Only parse anchors if both --anchors option is given and the current link has an anchor."
This reverts commit b238527d54.
|
2010-03-10 00:04:02 +01:00 |
|
Bastian Kleineidam
|
b238527d54
|
Only parse anchors if both --anchors option is given and the current link has an anchor.
|
2010-03-09 11:45:50 +01:00 |
|
Bastian Kleineidam
|
57397e938b
|
Improved linkname parsing by adding a new peek() HTML parser function.
|
2010-03-09 11:31:12 +01:00 |
|
Bastian Kleineidam
|
51a0ef0ad4
|
Speed up HTML parsing by stopping early and adding callbacks.
|
2010-03-08 09:04:33 +01:00 |
|
Bastian Kleineidam
|
1e15e55689
|
Fix errors in Word file parsing.
|
2010-03-07 19:43:08 +01:00 |
|
Bastian Kleineidam
|
6a2fcf8ae9
|
Parse links in Word files.
|
2010-03-07 19:20:51 +01:00 |
|
Bastian Kleineidam
|
77daf80e82
|
Add url encoding parameter
|
2009-11-28 11:56:35 +01:00 |
|
Bastian Kleineidam
|
5e06b6b8d4
|
Updated FSF address in GPL blurb
|
2009-07-24 23:58:20 +02:00 |
|
Bastian Kleineidam
|
7f67027abf
|
ignore the fragment part (ie. the anchor) of URIs when
+ getting and caching content
|
2009-06-26 07:22:36 +02:00 |
|
Bastian Kleineidam
|
59ffbd43f0
|
Use AttrDict for transport object in loggers.
|
2009-03-07 09:43:55 +01:00 |
|
Bastian Kleineidam
|
7a59763508
|
Remove unused SetList container
|
2009-03-07 00:42:27 +01:00 |
|
Bastian Kleineidam
|
2351506752
|
Use plain list for info strings.
|
2009-03-07 00:19:19 +01:00 |
|
Bastian Kleineidam
|
88dbcb30cd
|
Remove unused url_data.info tags - the tags were always None
|
2009-03-06 21:20:09 +01:00 |
|
Bastian Kleineidam
|
ba160350dd
|
Introduced transport object API for logging.
|
2009-03-06 19:30:58 +01:00 |
|
Bastian Kleineidam
|
b8944e493a
|
Use new exception log keyword when logging errors
|
2009-03-02 13:18:36 +01:00 |
|
Bastian Kleineidam
|
7862147ca3
|
Fix showing content size.
|
2009-03-01 23:04:48 +01:00 |
|
Bastian Kleineidam
|
8caa601a7e
|
Python 3.0 compatibility: use exc.args[] instead of exc[]
|
2009-02-24 12:41:45 +01:00 |
|
calvin
|
7214943f38
|
Remove wrong function return type documentation
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3959 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2009-02-18 15:34:46 +00:00 |
|
calvin
|
e9805dbd8a
|
Updated copyright year to 2009
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3887 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2009-01-08 14:18:03 +00:00 |
|
calvin
|
42c3e71329
|
Improved and tested opera bookmark parser
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3863 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-11-20 07:52:02 +00:00 |
|
calvin
|
d26386d03f
|
Catch errors when getting content for title.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3814 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-06-12 15:38:26 +00:00 |
|
calvin
|
290528b84f
|
Added title attribute to URL data.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3790 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-06-07 13:07:56 +00:00 |
|
calvin
|
99269d12cc
|
Add base method for Url.get_title()
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3788 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-06-07 13:07:38 +00:00 |
|
calvin
|
5f4d61e018
|
Use keyword arguments in translation strings.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3780 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-27 19:44:40 +00:00 |
|
calvin
|
66ff422f6b
|
Allow overwriting of an old check result.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3776 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-27 19:42:38 +00:00 |
|
calvin
|
dbb498a395
|
Add virus checking
New option --scan-virus to check the content of URLs for
viruses with ClamAV.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3753 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-20 08:57:37 +00:00 |
|
calvin
|
bacb59597e
|
Use relative imports from Python 2.5
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3750 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-09 06:16:03 +00:00 |
|
calvin
|
b96e8120d6
|
Add W3C Validator checks
Add new options --check-html-w3 and --check-css-w3 to allow checking
of HTML and CSS pages with the online W3C validators.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3748 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-05-08 10:36:54 +00:00 |
|
calvin
|
df9f31dcb1
|
Only check HTML/CSS syntax of intern URLs
The HTML and CSS syntax check now only applies to URLs
which match those given on the command line.
This makes checking of personal pages easier.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3743 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-29 17:48:47 +00:00 |
|
calvin
|
ac4d09f83d
|
Fix errors in CSS and HTML syntax check
Properly encode the warning messages as Unicode, and prevent
overwriting of the "log" module with a local variable.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3742 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-29 17:48:22 +00:00 |
|
calvin
|
5d8bdaaa1f
|
Use generators instead of lists where possible
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3739 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-28 00:26:02 +00:00 |
|
calvin
|
3eac1be9ab
|
Require and use Python 2.5
Use Python 2.5 features and get rid of old compat code. Also some
code cleanups have been made.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3737 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-27 11:39:21 +00:00 |
|
calvin
|
72db31e546
|
Only check syntax of valid URLs
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3726 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-25 07:53:11 +00:00 |
|
calvin
|
973da91f44
|
Source code cleanup: use or remove unused variables
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3724 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-25 07:49:52 +00:00 |
|
calvin
|
62efec3b35
|
Added CSS syntax check.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3719 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-24 09:44:18 +00:00 |
|
calvin
|
cce6affa17
|
Add --check-html option to check the HTML syntax.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3718 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-23 23:04:31 +00:00 |
|
calvin
|
5a2f89fa3d
|
Add redirect warning for commandline URLs
If URLs given on the commandline are redirected, the automatic
intern patterns might not match anymore. A warning makes this
more prominent.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3712 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-21 09:18:36 +00:00 |
|
calvin
|
8ae6d94b45
|
Improved error messages for exceptions
Prepend the exception name before the error message of exceptions.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3694 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-19 07:47:00 +00:00 |
|
calvin
|
4968f1b3cd
|
Prevent empty exception values.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3690 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2008-04-18 07:42:24 +00:00 |
|