mirror of
https://github.com/Hopiu/linkchecker.git
synced 2026-04-03 14:40:32 +00:00
118 lines
3.9 KiB
Markdown
118 lines
3.9 KiB
Markdown
title: "Frequently asked questions"
|
|
---
|
|
**Q: LinkChecker produced an error, but my web page is ok with
|
|
Mozilla/IE/Opera/... Is this a bug in LinkChecker?**
|
|
|
|
A: Please check your web pages first. Are they really ok?
|
|
Often the major browsers are very forgiving and good at handling HTML
|
|
of HTTP errors, while LinkChecker complains in most cases of invalid
|
|
content.
|
|
|
|
Use the `--check-html` option, or check if you are using a proxy
|
|
which produces the error.
|
|
|
|
**Q: I still get an error, but the page is definitely ok.**
|
|
|
|
A: Some servers deny access of automated tools (also called robots)
|
|
like LinkChecker. This is not a bug in LinkChecker but rather a
|
|
policy by the webmaster running the website you are checking. Look in
|
|
the ``/robots.txt`` file which follows the
|
|
[robots.txt exclusion standard](http://www.robotstxt.org/robotstxt.html).
|
|
|
|
If you yourself are the webmaster, consider allowing LinkChecker to
|
|
check your web pages by adding the following to your robots.txt file:
|
|
|
|
User-Agent: LinkChecker
|
|
Allow: /
|
|
|
|
**Q: How can I tell LinkChecker which proxy to use?**
|
|
|
|
A: LinkChecker works automatically with proxies. In a Unix or Windows
|
|
environment, set the http_proxy, https_proxy, ftp_proxy environment
|
|
variables to a URL that identifies the proxy server before starting
|
|
LinkChecker. For example
|
|
|
|
$ http_proxy="http://www.example.com:3128"
|
|
$ export http_proxy
|
|
|
|
|
|
**Q: The link "mailto:john@company.com?subject=Hello John" is reported
|
|
as an error.**
|
|
|
|
A: You have to quote special characters (e.g. spaces) in the subject field.
|
|
The correct link should be "mailto:...?subject=Hello%20John"
|
|
Unfortunately browsers like IE and Netscape do not enforce this.
|
|
|
|
|
|
**Q: Has LinkChecker JavaScript support?**
|
|
|
|
A: No, it never will. If your page is only working with JS, it is
|
|
better to use a browser testing tool like [Selenium](http://seleniumhq.org/).
|
|
|
|
|
|
**Q: Is the LinkCheckers cookie feature insecure?**
|
|
|
|
A: Potentially yes. This depends on what information you specify in the
|
|
cookie file. The cookie information will be sent to the specified
|
|
hosts.
|
|
|
|
Also, the following restrictions apply for cookies that LinkChecker
|
|
receives from the hosts it check:
|
|
|
|
- Cookies will only be sent back to the originating server (ie. no
|
|
third party cookies are allowed).
|
|
- Cookies are only stored in memory. After LinkChecker finishes, they
|
|
are lost.
|
|
- The cookie feature is disabled as default.
|
|
|
|
|
|
**Q: LinkChecker retrieves a /robots.txt file for every site it
|
|
checks. What is that about?**
|
|
|
|
A: LinkChecker follows the
|
|
[robots.txt exclusion standard](http://www.robotstxt.org/robotstxt.html).
|
|
To avoid misuse of LinkChecker, you cannot turn this feature off.
|
|
See the [Web Robot pages](http://www.robotstxt.org/robotstxt.html) and the
|
|
[Spidering report](http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt)
|
|
for more info.
|
|
|
|
If you yourself are the webmaster, consider allowing LinkChecker to
|
|
check your web pages by adding the following to your robots.txt file:
|
|
|
|
User-Agent: LinkChecker
|
|
Allow: /
|
|
|
|
|
|
**Q: How do I print unreachable/dead documents of my website with
|
|
LinkChecker?**
|
|
|
|
A: No can do. This would require file system access to your web
|
|
repository and access to your web server configuration.
|
|
|
|
|
|
**Q: How do I check HTML/XML/CSS syntax with LinkChecker?**
|
|
|
|
A: Use the `--check-html` and `--check-css` options.
|
|
|
|
|
|
**Q: I want to have my own logging class. How can I use it in LinkChecker?**
|
|
|
|
A: A Python API lets you define new logging classes.
|
|
Define your own logging class as a subclass of _Logger or any other
|
|
logging class in the log module.
|
|
Then call the add_logger function in Config.Configuration to register
|
|
your new Logger.
|
|
After this append a new Logging instance to the fileoutput.
|
|
|
|
```python
|
|
import linkcheck
|
|
class MyLogger(linkcheck.logger._Logger):
|
|
LoggerName = 'mylog'
|
|
LoggerArgs = {'fileoutput': log_format, 'filename': 'foo.txt'}
|
|
|
|
# ...
|
|
|
|
cfg = linkcheck.configuration.Configuration()
|
|
cfg.logger_add(MyLogger)
|
|
cfg['fileoutput'].append(cfg.logger_new(MyLogger.LoggerName))
|
|
```
|