From cca21759f0075e290db8c2432ad7a3612cfccf15 Mon Sep 17 00:00:00 2001 From: calvin Date: Fri, 22 Apr 2005 13:39:25 +0000 Subject: [PATCH] add robots.txt RFC link git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2520 e7d03fd6-7b0d-0410-9947-9c21f3af8025 --- doc/en/documentation.html | 4 ++-- doc/en/documentation.txt | 4 +++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/doc/en/documentation.html b/doc/en/documentation.html index c6e09305..8e3ef274 100644 --- a/doc/en/documentation.html +++ b/doc/en/documentation.html @@ -277,7 +277,7 @@ If you really want to disable this, use the

Q: I see LinkChecker gets a /robots.txt file for every site it checks. What is that about?

-

A: LinkChecker follows the robots.txt exclusion standard. To avoid +

A: LinkChecker follows the robots.txt exclusion standard. To avoid misuse of LinkChecker, you cannot turn this feature off. See the Web Robot pages and the Spidering report for more info.

Q: Ctrl-C does not stop LinkChecker immediately. Why is that so?

@@ -297,7 +297,7 @@ and look for missing files.

diff --git a/doc/en/documentation.txt b/doc/en/documentation.txt index 0ccd53f9..cb52cdf8 100644 --- a/doc/en/documentation.txt +++ b/doc/en/documentation.txt @@ -299,10 +299,12 @@ option. **Q: I see LinkChecker gets a /robots.txt file for every site it checks. What is that about?** -A: LinkChecker follows the robots.txt exclusion standard. To avoid +A: LinkChecker follows the `robots.txt exclusion standard`_. To avoid misuse of LinkChecker, you cannot turn this feature off. See the `Web Robot pages`_ and the `Spidering report`_ for more info. +.. _`robots.txt exclusion standard`: + http://www.robotstxt.org/wc/norobots-rfc.html .. _`Web Robot pages`: http://www.robotstxt.org/wc/robots.html .. _`Spidering report`: