Chris Mayo
eab2fa410e
Log robots.txt as the sitemap parent URL
...
This is the location the sitemap URL was found in. The line being
reported is the line in robots.txt.
2022-10-17 19:21:03 +01:00
Chris Mayo
7367e6e865
Skip incomplete Sitemap in robots.txt and warn
...
Sitemap values should be fully qualified URLs; LinkChecker may not
resolve relative paths correctly.
2022-10-17 19:21:03 +01:00
Chris Mayo
fe5a34c68f
Remove linkcheck.checker.proxysupport
...
Set up the requests.Session() with the complete proxy configuration
to fix a problem with using an HTTP server as an HTTPS proxy and
potential redirection issues.
Requests handles no_proxy.
2021-12-13 19:25:23 +00:00
Chris Mayo
2a77e12618
Replace deprecated Thread.getName() and Condition.notifyAll()
2021-11-16 19:45:38 +00:00
Paul Haerle
f395c74aac
Make ResultCache max_size configurable ( #544 )
...
* Make ResultCache max_size configurable
fixes #463
* Add tests and docs.
* fix documentation...
...adapt the source, not the auto-generated man pages themselves as
requested in #544 .
* fix typo.
2021-06-21 19:45:19 +01:00
Chris Mayo
500c13e2cb
Log a debug message when a cached URL is skipped
...
Skipping introduced in:
eaa538c8 ("don't check one url multiple times", 2016-11-09)
2020-07-21 19:54:18 +01:00
Chris Mayo
b974ec3262
Review comments on black linkcheck/
2020-06-01 16:07:21 +01:00
Chris Mayo
a92a684ac4
Run black on linkcheck/
2020-05-30 17:01:36 +01:00
Marius Gedminas
4f3fe5e1c3
Make sure fetching robots.txt uses the configured timeout
...
Closes #396 .
2020-05-22 10:53:33 +03:00
Chris Mayo
a15a2833ca
Remove spaces after names in class method definitions
...
And also nested functions.
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
44e81d27dd
Remove inheriting object
...
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1
Remove # -*- coding: lines
...
Except for tests that include non-unicode characters:
tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Yaroslav Halchenko
b78c2d200e
DOC: minor typo fix
2018-11-01 11:08:09 -04:00
Marius Gedminas
4a092c218c
Whitespace bigotry
2017-03-14 17:18:27 +02:00
Petr Dlouhý
eaa538c814
don't check one url multiple times
2017-02-14 10:23:25 +01:00
Nicolas Bigaouette
4e56eceb35
Detect if "url_data" contains proxy attributes before using them.
...
Fix proposed by @colwilson in issue #555 .
2014-11-12 09:58:30 -05:00
Bastian Kleineidam
35eb30432e
Added some Python3 fixes.
2014-09-12 19:36:30 +02:00
Bastian Kleineidam
06c6b80ed3
Fix proxy support.
2014-09-05 22:48:10 +02:00
Arlo Louis O'Keeffe
52337f82cb
Use correct attribute
2014-09-03 09:36:22 +02:00
Bastian Kleineidam
b646293fd6
Remove unused import.
2014-07-15 22:38:57 +02:00
Bastian Kleineidam
90257a1b5e
Replace twill with custom code.
2014-07-15 18:37:05 +02:00
Bastian Kleineidam
a665d35feb
Use proxies and checker session in robots.txt.
2014-07-14 20:28:28 +02:00
Bastian Kleineidam
6c38b4165a
Use given HTTP auth data for robots.txt fetching.
2014-07-14 19:50:11 +02:00
Bastian Kleineidam
22caa9367a
Refactor recursion checks.
2014-04-10 17:50:55 +02:00
Bastian Kleineidam
08fbd891ef
Do not check external robots.txt sitemaps.
2014-04-09 19:44:29 +02:00
Bastian Kleineidam
c57f607fc3
Use urldata.add_url()
2014-04-07 18:54:33 +02:00
Bastian Kleineidam
fc73c6ca6e
Log number of checked unique URLs.
2014-03-14 23:46:17 +01:00
Bastian Kleineidam
19b8baf08c
Move cached queue items to top once in a while.
2014-03-14 22:08:51 +01:00
Bastian Kleineidam
b18854649d
Count unique URLs for url queue limit.
2014-03-14 20:21:46 +01:00
Bastian Kleineidam
257644e660
Add cache length function to get number of cached elements.
2014-03-14 20:19:34 +01:00
Bastian Kleineidam
6b334dc79b
Fix URL result caching.
2014-03-08 19:35:10 +01:00
Bastian Kleineidam
b17211f162
Set for release.
2014-03-04 21:36:24 +01:00
Bastian Kleineidam
82f81241fd
Check all links and add better caching.
2014-03-03 23:29:45 +01:00
Bastian Kleineidam
6f205a2574
Support checking Sitemap: URLs in robots.txt files.
2014-03-01 20:25:19 +01:00
Bastian Kleineidam
0f0d79c7e0
Remove crawl-delay stuff
2014-03-01 20:01:42 +01:00
Bastian Kleineidam
7b34be590b
Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.
2014-03-01 00:12:34 +01:00
Bastian Kleineidam
c806be5c15
Updated copyright
2014-01-08 22:33:04 +01:00
Bastian Kleineidam
e0a2558b2b
Updated copyright.
2013-12-24 07:13:16 +01:00
Bastian Kleineidam
0ca63797bf
Remove content cache.
2013-12-10 23:41:52 +01:00
Bastian Kleineidam
36badddfac
Update cookie code from Python module.
2013-12-04 19:05:08 +01:00
Bastian Kleineidam
123578a4cd
Make per-host connection limits configurable.
2013-02-27 19:37:28 +01:00
Bastian Kleineidam
35bc79dd90
Updated copyright.
2013-01-25 21:14:27 +01:00
Bastian Kleineidam
faa743e876
Increase per-host connection limits.
2013-01-22 18:18:48 +01:00
Bastian Kleineidam
0283362ce6
Updated copyright.
2012-12-23 21:32:16 +01:00
Bastian Kleineidam
42a17cbb98
Prepare py3 port and display sys.argv on internal errors.
2012-11-26 18:49:07 +01:00
Bastian Kleineidam
e5735e2a5d
Fix URL queue handling.
2012-11-08 12:48:21 +01:00
Bastian Kleineidam
bc683577de
Remove URLs from the in_progress cache.
2012-11-08 11:03:16 +01:00
Bastian Kleineidam
eabaa41bd2
Do not check duplicate URLs.
2012-11-06 21:34:22 +01:00
Bastian Kleineidam
8750d55a73
Add configuration entry for maximum number of URLs.
2012-10-14 11:13:55 +02:00
Bastian Kleineidam
3b5877161c
Improved debugging.
2012-10-13 13:36:28 +02:00