Merge pull request #470 from cjmayo/sphinx

Generate html documentation and man pages using Sphinx
This commit is contained in:
Chris Mayo 2020-08-22 16:26:41 +01:00 committed by GitHub
commit b06c6da75d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
87 changed files with 26289 additions and 12456 deletions

5
.gitignore vendored
View file

@ -28,7 +28,7 @@ Changelog.linkchecker*
/doc/html/*.qhc
/doc/html/*.qch
/.achievements
/doc/*.mo
/doc/i18n/locales/*/LC_MESSAGES/*.mo
/LinkChecker-*-portable.zip
/LinkChecker-*.exe
/LinkChecker.egg-info
@ -41,3 +41,6 @@ Changelog.linkchecker*
tests/checker/data/https_cert.pem
tests/checker/data/https_key.pem
coverage.xml
doc/html
doc/src/_build
doc/src/code/linkcheck

View file

@ -1,102 +0,0 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, gender identity and expression, level of experience,
nationality, personal appearance, race, religion, or sexual identity and
orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting one of the persons listed below. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project maintainers is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
Project maintainers are encouraged to follow the spirit of the
[Django Code of Conduct Enforcement Manual][enforcement] when
receiving reports.
[enforcement]: https://www.djangoproject.com/conduct/enforcement-manual/
## Contacts
The following people have volunteered to be available to respond to
Code of Conduct reports. They have reviewed existing literature and
agree to follow the aforementioned process in good faith. They also
accept OpenPGP-encrypted email:
* Antoine Beaupré <anarcat@debian.org>
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at [http://contributor-covenant.org/version/1/4][version]
[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/
Changes
-------
The Code of Conduct was modified to refer to *project maintainers*
instead of *project team* and small paragraph was added to refer to
the Django enforcement manual.
> Note: We have so far determined that writing an explicit enforcement
> policy is not necessary, considering the available literature
> already available online and the relatively small size of the
> community. This may change in the future if the community grows
> larger.

109
CODE_OF_CONDUCT.rst Normal file
View file

@ -0,0 +1,109 @@
Contributor Covenant Code of Conduct
====================================
Our Pledge
----------
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our
project and our community a harassment-free experience for everyone,
regardless of age, body size, disability, ethnicity, gender identity and
expression, level of experience, nationality, personal appearance, race,
religion, or sexual identity and orientation.
Our Standards
-------------
Examples of behavior that contributes to creating a positive environment
include:
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
- The use of sexualized language or imagery and unwelcome sexual
attention or advances
- Trolling, insulting/derogatory comments, and personal or political
attacks
- Public or private harassment
- Publishing others private information, such as a physical or
electronic address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a
professional setting
Our Responsibilities
--------------------
Project maintainers are responsible for clarifying the standards of
acceptable behavior and are expected to take appropriate and fair
corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit,
or reject comments, commits, code, wiki edits, issues, and other
contributions that are not aligned to this Code of Conduct, or to ban
temporarily or permanently any contributor for other behaviors that they
deem inappropriate, threatening, offensive, or harmful.
Scope
-----
This Code of Conduct applies both within project spaces and in public
spaces when an individual is representing the project or its community.
Examples of representing a project or community include using an
official project e-mail address, posting via an official social media
account, or acting as an appointed representative at an online or
offline event. Representation of a project may be further defined and
clarified by project maintainers.
Enforcement
-----------
Instances of abusive, harassing, or otherwise unacceptable behavior may
be reported by contacting one of the persons listed below. All
complaints will be reviewed and investigated and will result in a
response that is deemed necessary and appropriate to the circumstances.
The project maintainers is obligated to maintain confidentiality with
regard to the reporter of an incident. Further details of specific
enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in
good faith may face temporary or permanent repercussions as determined
by other members of the projects leadership.
Project maintainers are encouraged to follow the spirit of the `Django
Code of Conduct Enforcement
Manual <https://www.djangoproject.com/conduct/enforcement-manual/>`__
when receiving reports.
Contacts
--------
The following people have volunteered to be available to respond to Code
of Conduct reports. They have reviewed existing literature and agree to
follow the aforementioned process in good faith. They also accept
OpenPGP-encrypted email:
- Antoine Beaupré anarcat@debian.org
Attribution
-----------
This Code of Conduct is adapted from the `Contributor
Covenant <http://contributor-covenant.org>`__, version 1.4, available at
`http://contributor-covenant.org/version/1/4 <http://contributor-covenant.org/version/1/4/>`__
Changes
-------
The Code of Conduct was modified to refer to *project maintainers*
instead of *project team* and small paragraph was added to refer to the
Django enforcement manual.
Note: We have so far determined that writing an explicit enforcement
policy is not necessary, considering the available literature already
available online and the relatively small size of the community. This
may change in the future if the community grows larger.

View file

@ -1,142 +0,0 @@
# Contribution guide
This document outlines how to contribute to this project. It details
instructions on how to submit issues, bug reports and patches.
Before you participate in the community, you should also agree to
respect the code of conduct, shipped in [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) in the
source code.
[project]: https://github.com/linkchecker/linkchecker/
[issues]: https://github.com/linkchecker/linkchecker/issues
[pull requests]: https://github.com/linkchecker/linkchecker/pulls
# Positive feedback
Even if you have no changes, suggestions, documentation or bug reports
to submit, even just positive feedback like "it works" goes a long
way. It shows the project is being used and gives instant
gratification to contributors. So we welcome emails that tell us of
your positive experiences with the project or just thank you
notes. Contact maintainers directly or submit a closed issue with your
story. You can also send your "thanks" through <https://saythanks.io/>.
# Issues and bug reports
We want you to report issues you find in the software. It is a
recognized and important part of contributing to this project. All
issues will be read and replied to politely and
professionally. Issues and bug reports should be filed on the
[issue tracker][issues].
## Issue triage
Issue triage is a useful contribution as well. You can review the
[issues][] in the [project page][project] and, for each issue:
- try to reproduce the issue, if it is not reproducible, label it with
`help-wanted` and explain the steps taken to reproduce
- if information is missing, label it with `invalid` and request
specific information
- if the feature request is not within the scope of the project or
should be refused for other reasons, use the `wontfix` label and
close the issue
- mark feature requests with the `enhancement` label, bugs with
`bug`, duplicates with `duplicate` and so on...
Note that some of those operations are available only to project
maintainers, see below for the different statuses.
## Security issues
Security issues should first be disclosed privately to the project
maintainers, which support receiving encrypted emails through the
usual OpenPGP key discovery mechanisms.
This project cannot currently afford bounties for security issues. We
would still ask that you coordinate disclosure, giving the project a
reasonable delay to produce a fix and prepare a release before public
disclosure.
Public recognition will be given to reporters security issues if
desired. We otherwise agree with the [Disclosure Guidelines][] of the
[HackerOne project][], at the time of writing.
[Disclosure Guidelines]: https://www.hackerone.com/disclosure-guidelines
[HackerOne project]: https://www.hackerone.com/
# Patches
Patches can be submitted through [pull requests][] on the
[project page][project].
Some guidelines for patches:
* A patch should be a minimal and accurate answer to exactly one
identified and agreed problem.
* A patch must compile cleanly and pass project self-tests on all
target platforms.
* A patch commit message must consist of a single short (less than 50
characters) line stating a summary of the change, followed by a
blank line and then a description of the problem being solved and
its solution, or a reason for the change. Write more information,
not less, in the commit log.
* Patches should be reviewed by at least one maintainer before being merged.
Project maintainers should merge their own patches only when they have been
approved by other maintainers, unless there is no response within a
reasonable timeframe (roughly one week) or there is an urgent change
to be done (e.g. security or data loss issue).
As an exception to this rule, this specific document cannot be changed
without the consensus of all administrators of the project.
> Note: Those guidelines were inspired by the
> [Collective Code Construct Contract][C4]. The document was found to
> be a little too complex and hard to read and wasn't adopted in its
> entirety. See this [discussion][] for more information.
[C4]: https://rfc.zeromq.org/spec:42/C4/
[discussion]: https://github.com/zeromq/rfc/issues?utf8=%E2%9C%93&q=author%3Aanarcat%20
## Patch triage
You can also review existing pull requests, by cloning the
contributor's repository and testing it. If the tests do not pass
(either locally or in Travis), if the patch is incomplete or otherwise
does not respect the above guidelines, submit a review with "changes
requested" with reasoning.
# Membership
There are three levels of membership in the project, Administrator
(also known as "Owner" in GitHub), Maintainer (also known as
"Member"), or regular users (everyone with or without a GitHub
account). Anyone is welcome to contribute to the project within the
guidelines outlined in this document, regardless of their status, and
that includes regular users.
Maintainers can:
* do everything regular users can
* review, push and merge pull requests
* edit and close issues
Administrators can:
* do everything maintainers can
* add new maintainers
* promote maintainers to administrators
Regular users can be promoted to maintainers if they contribute to the
project, either by participating in issues, documentation or pull
requests.
Maintainers can be promoted to administrators when they have given significant
contributions for a sustained timeframe, by consensus of the current
administrators. This process should be open and decided as any other issue.
Maintainers can be demoted by administrators and administrators can be
demoted by the other administrators' consensus. Unresponsive maintainers
or administrators can be removed after a month unless they specifically
announced a leave.

149
CONTRIBUTING.rst Normal file
View file

@ -0,0 +1,149 @@
Contribution Guide
==================
This document outlines how to contribute to this project. It details
instructions on how to submit issues, bug reports and patches.
Before you participate in the community, you should also agree to
respect the code of conduct, shipped in
:doc:`CODE_OF_CONDUCT.rst <code_of_conduct>` in the source code.
Positive feedback
-----------------
Even if you have no changes, suggestions, documentation or bug reports
to submit, even just positive feedback like “it works” goes a long way.
It shows the project is being used and gives instant gratification to
contributors. So we welcome emails that tell us of your positive
experiences with the project or just thank you notes. Contact
maintainers directly or submit a closed issue with your story. You can
also send your “thanks” through https://saythanks.io/.
Issues and bug reports
----------------------
We want you to report issues you find in the software. It is a
recognized and important part of contributing to this project. All
issues will be read and replied to politely and professionally. Issues
and bug reports should be filed on the `issue
tracker <https://github.com/linkchecker/linkchecker/issues>`__.
Issue triage
^^^^^^^^^^^^
Issue triage is a useful contribution as well. You can review the
`issues <https://github.com/linkchecker/linkchecker/issues>`__ in the
`project page <https://github.com/linkchecker/linkchecker/>`__ and, for
each issue:
- try to reproduce the issue, if it is not reproducible, label it with
``help-wanted`` and explain the steps taken to reproduce
- if information is missing, label it with ``invalid`` and request
specific information
- if the feature request is not within the scope of the project or
should be refused for other reasons, use the ``wontfix`` label and
close the issue
- mark feature requests with the ``enhancement`` label, bugs with
``bug``, duplicates with ``duplicate`` and so on…
Note that some of those operations are available only to project
maintainers, see below for the different statuses.
Security issues
^^^^^^^^^^^^^^^
Security issues should first be disclosed privately to the project
maintainers, which support receiving encrypted emails through the usual
OpenPGP key discovery mechanisms.
This project cannot currently afford bounties for security issues. We
would still ask that you coordinate disclosure, giving the project a
reasonable delay to produce a fix and prepare a release before public
disclosure.
Public recognition will be given to reporters security issues if
desired. We otherwise agree with the `Disclosure
Guidelines <https://www.hackerone.com/disclosure-guidelines>`__ of the
`HackerOne project <https://www.hackerone.com/>`__, at the time of
writing.
Patches
-------
Patches can be submitted through `pull
requests <https://github.com/linkchecker/linkchecker/pulls>`__ on the
`project page <https://github.com/linkchecker/linkchecker/>`__.
Some guidelines for patches:
- A patch should be a minimal and accurate answer to exactly one
identified and agreed problem.
- A patch must compile cleanly and pass project self-tests on all
target platforms.
- A patch commit message must consist of a single short (less than 50
characters) line stating a summary of the change, followed by a blank
line and then a description of the problem being solved and its
solution, or a reason for the change. Write more information, not
less, in the commit log.
- Patches should be reviewed by at least one maintainer before being
merged.
Project maintainers should merge their own patches only when they have
been approved by other maintainers, unless there is no response within a
reasonable timeframe (roughly one week) or there is an urgent change to
be done (e.g. security or data loss issue).
As an exception to this rule, this specific document cannot be changed
without the consensus of all administrators of the project.
Note: Those guidelines were inspired by the `Collective Code
Construct Contract <https://rfc.zeromq.org/spec:42/C4/>`__. The
document was found to be a little too complex and hard to read and
wasnt adopted in its entirety. See this
`discussion <https://github.com/zeromq/rfc/issues?utf8=%E2%9C%93&q=author%3Aanarcat%20>`__
for more information.
Patch triage
^^^^^^^^^^^^
You can also review existing pull requests, by cloning the contributors
repository and testing it. If the tests do not pass (either locally or
in Travis), if the patch is incomplete or otherwise does not respect the
above guidelines, submit a review with “changes requested” with
reasoning.
Membership
----------
There are three levels of membership in the project, Administrator (also
known as “Owner” in GitHub), Maintainer (also known as “Member”), or
regular users (everyone with or without a GitHub account). Anyone is
welcome to contribute to the project within the guidelines outlined in
this document, regardless of their status, and that includes regular
users.
Maintainers can:
- do everything regular users can
- review, push and merge pull requests
- edit and close issues
Administrators can:
- do everything maintainers can
- add new maintainers
- promote maintainers to administrators
Regular users can be promoted to maintainers if they contribute to the
project, either by participating in issues, documentation or pull
requests.
Maintainers can be promoted to administrators when they have given
significant contributions for a sustained timeframe, by consensus of the
current administrators. This process should be open and decided as any
other issue.
Maintainers can be demoted by administrators and administrators can be
demoted by the other administrators consensus. Unresponsive maintainers
or administrators can be removed after a month unless they specifically
announced a leave.

View file

@ -37,14 +37,12 @@ recursive-include doc \
*.po \
*.pot \
*.py \
*.rst \
*.sh \
*.txt \
*.yaml \
*.yml \
Makefile \
linkcheckerrc_* \
po4a.conf \
wokconfig
linkcheckerrc_*
recursive-include po \
*.mo \
*.po \

View file

@ -1,43 +1,34 @@
HTMLDIR:=web/media
MANHTMLFILES:= \
$(HTMLDIR)/man1/linkchecker.1.html \
$(HTMLDIR)/man5/linkcheckerrc.5.html
MANFILES:=linkchecker.1 linkcheckerrc.5
LOCALES:=en de
all:
all: html man
po4a:
po4a --localized-charset=UTF-8 po4a.conf
code: clean
PYTHONPATH=.. sphinx-autogen src/code/index.rst
man: $(MANHTMLFILES)
html:
make -C src html
$(HTMLDIR)/man1/linkchecker.1.html: en/linkchecker.1
mandoc -Thtml $< > $@
@sed -i -e \
's:<b>linkcheckerrc</b>(5):<a href="../man5/linkcheckerrc.5.html" class="Xr">linkcheckerrc(5)</a>:g' \
$(HTMLDIR)/man1/linkchecker.1.html
locale:
make -C src locale
$(HTMLDIR)/man5/linkcheckerrc.5.html: en/linkcheckerrc.5
mandoc -Thtml $< > $@
@sed -i -e \
's:<b>linkchecker</b>(1):<a href="../man1/linkchecker.1.html" class="Xr">linkchecker(1)</a>:g' \
$(HTMLDIR)/man5/linkcheckerrc.5.html
man:
make -C src man; \
make -C src -e SPHINXOPTS="-D language='de' -t de" LANGUAGE="de" man
# check all makefiles for formatting warnings
check:
@t=$(shell tempfile); \
for loc in $(LOCALES); do \
@for loc in $(LOCALES); do \
for manfile in $(MANFILES); do \
echo "Checking $$loc/$$manfile"; \
LC_ALL=en_US.UTF-8 MANWIDTH=80 man --warnings -E UTF-8 -l $$loc/$$manfile > /dev/null 2>$$t ; \
if [ -s $$t ]; then cat $$t; exit 1; fi; \
echo "Checking man/$$loc/$$manfile"; \
LC_ALL=en_US.UTF-8 MANWIDTH=80 mandoc -T lint -W error man/$$loc/$$manfile; \
done; \
done
clean:
rm $(MANHTMLFILES)
.PHONY: po4a man check clean
rm -rf src/_build; \
rm -rf src/code/linkcheck; \
rm -rf html; \
rm -rf man
.PHONY: check clean html locale man

3592
doc/de.po

File diff suppressed because it is too large Load diff

View file

@ -1,499 +0,0 @@
.\"*******************************************************************
.\"
.\" This file was generated with po4a. Translate the source file.
.\"
.\"*******************************************************************
.TH LINKCHECKER 1 2020\-06\-05 LinkChecker "LinkChecker User Manual"
.SH NAME
linkchecker \- Kommandozeilenprogramm zum Prüfen von HTML Dokumenten und
Webseiten auf ungültige Verknüpfungen
.SH SYNTAX
\fBlinkchecker\fP [\fIOptionen\fP] [\fIDatei\-oder\-URL\fP]...
.SH BESCHREIBUNG
.TP 2
LinkChecker beinhaltet
.IP \(bu
rekursives Prüfen und Multithreading
.IP \(bu
Ausgabe als farbigen oder normalen Text, HTML, SQL, CSV, XML oder einen
Sitemap\-Graphen in verschiedenen Formaten
.IP \(bu
Unterstützung von HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet und
Verknüpfungen auf lokale Dateien
.IP \(bu
Einschränkung der Linküberprüfung mit URL\-Filter,
.IP \(bu
Proxy\-Unterstützung
.IP \(bu
Benutzer/Passwort Authorisierung für HTTP, FTP und Telnet
.IP \(bu
Unterstützung des robots.txt Protokolls
.IP \(bu
Unterstützung für Cookies
.IP \(bu
Unterstützung für HTML5
.IP \(bu
HTML\- und CSS\-Syntaxprüfung
.IP \(bu
Antivirusprüfung
.IP \(bu
ein Kommandozeilenprogramm und web interface
.SH BEISPIELE
.TP 2
The most common use checks the given domain recursively:
\fBlinkchecker http://www.example.com/\fP
.br
Beachten Sie dass dies die komplette Domäne überprüft, welche aus mehreren
tausend URLs bestehen kann. Benutzen Sie die Option \fB\-r\fP, um die
Rekursionstiefe zu beschränken.
.TP
Don't check URLs with \fB/secret\fP in its name. All other links are checked as usual:
\fBlinkchecker \-\-ignore\-url=/secret mysite.example.com\fP
.TP
Überprüfung einer lokalen HTML Datei unter Unix:
\fBlinkchecker ../bla.html\fP
.TP
Überprüfung einer lokalen HTML Datei unter Windows:
\fBlinkchecker c:\temp\test.html\fP
.TP
Sie können den \fBhttp://\fP URL Anteil weglassen wenn die Domäne mit \fBwww.\fP beginnt:
\fBlinkchecker www.example.com\fP
.TP
Sie können den \fBftp://\fP URL Anteil weglassen wenn die Domäne mit \fBftp.\fP
\fBlinkchecker \-r0 ftp.example.com\fP
.TP
Erzeuge einen Sitemap Graphen und konvertiere ihn mit dem graphviz dot Programm:
\fBlinkchecker \-odot \-v www.example.com | dot \-Tps > sitemap.ps\fP
.SH OPTIONEN
.SS "Allgemeine Optionen"
.TP
\fB\-f\fP\fIDATEINAME\fP, \fB\-\-config=\fP\fIDATEINAME\fP
Benutze \fIDATEINAME\fP als Konfigurationsdatei. Standardmäßig benutzt
LinkChecker \fB~/.linkchecker/linkcheckerrc\fP.
.TP
\fB\-h\fP, \fB\-\-help\fP
Hilfe! Gebe Gebrauchsanweisung für dieses Programm aus.
.TP
\fB\-\-stdin\fP
Lese Liste von URLs zum Prüfen von der Standardeingabe, getrennt durch
Leerzeichen.
.TP
\fB\-t\fP\fINUMMER\fP, \fB\-\-threads=\fP\fINUMMER\fP
Generiere nicht mehr als die angegebene Anzahl von Threads. Die
Standardanzahl von Threads ist 10. Um Threads zu deaktivieren, geben Sie
eine nicht positive Nummer an.
.TP
\fB\-V\fP, \fB\-\-version\fP
Gebe die Version aus und beende das Programm.
.TP
\fB\-\-list\-plugins\fP
Print available check plugins and exit.
.
.SS Ausgabeoptionen
.TP
\fB\-D\fP\fINAME\fP, \fB\-\-debug=\fP\fINAME\fP
Gebe Testmeldungen aus für den angegebenen Logger. Verfügbare Logger sind
\fBcmdline\fP, \fBchecking\fP,\fBcache\fP, \fBdns\fP, \fBplugins\fP und \fBall\fP. Die Angabe
\fBall\fP ist ein Synonym für alle verfügbaren Logger. Diese Option kann
mehrmals angegeben werden, um mit mehr als einem Logger zu testen. Um
akkurate Ergebnisse zu erzielen, werden Threads deaktiviert.
.TP
\fB\-F\fP\fITYP\fP[\fB/\fP\fIENKODIERUNG\fP][\fB/\fP\fIDATEINAME\fP], \fB\-\-file\-output=\fP\fITYP\fP[\fB/\fP\fIENKODIERUNG\fP][\fB/\fP\fIDATEINAME\fP]
Ausgabe in eine Datei namens \fBlinkchecker\-out.\fP\fITYP\fP,
\fB$HOME/.linkchecker/blacklist\fP bei \fBblacklist\fP Ausgabe, oder \fIDATEINAME\fP
falls angegeben. Das \fIENCODING\fP gibt die Ausgabekodierung an. Der Standard
ist das der lokalen Spracheinstellung. Gültige Enkodierungen sind
aufgelistet unter
.UR https://docs.python.org/library/codecs.html#standard\-encodings
.UE .
.br
Der \fIDATEINAME\fP und \fIENKODIERUNG\fP Teil wird beim Ausgabetyp \fBnone\fP
ignoriert, ansonsten wird die Datei überschrieben falls sie existiert. Sie
können diese Option mehr als einmal verwenden. Gültige Ausgabetypen sind
\fBtext\fP, \fBhtml\fP, \fBsql\fP, \fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBsitemap\fP,
\fBnone\fP oder \fBblacklist\fP. Standard ist keine Dateiausgabe. Die
unterschiedlichen Ausgabetypen sind weiter unten dokumentiert. Beachten Sie,
dass Sie mit der Option \fB\-o none\fP jegliche Ausgaben auf der Konsole
verhindern können.
.TP
\fB\-\-no\-status\fP
Gebe keine Statusmeldungen aus.
.TP
\fB\-\-no\-warnings\fP
Gebe keine Warnungen aus. Standard ist die Ausgabe von Warnungen.
.TP
\fB\-o\fP\fITYP\fP[\fB/\fP\fIENKODIERUNG\fP], \fB\-\-output=\fP\fITYP\fP[\fB/\fP\fIENKODIERUNG\fP]
Gib Ausgabetyp als \fBtext\fP, \fBhtml\fP, \fBsql\fP, \fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP,
\fBsitemap\fP, \fBnone\fP oder \fBblacklist\fP an. Stadard Typ ist \fBtext\fP. Die
verschiedenen Ausgabetypen sind unten dokumentiert.
.br
Das \fIENCODING\fP gibt die Ausgabekodierung an. Der Standard ist das der
lokalen Spracheinstellung. Gültige Enkodierungen sind aufgelistet unter
.UR https://docs.python.org/library/codecs.html#standard\-encodings
.UE .
.TP
\fB\-q\fP, \fB\-\-quiet\fP
Keine Ausgabe, ein Alias für \fB\-o none\fP. Dies ist nur in Verbindung mit
\fB\-F\fP nützlich.
.TP
\fB\-v\fP, \fB\-\-verbose\fP
Gebe alle geprüften URLs aus. Standard ist es, nur fehlerhafte URLs und
Warnungen auszugeben.
.TP
\fB\-W\fP\fIREGEX\fP, \fB\-\-warning\-regex=\fP\fIREGEX\fP
Definieren Sie einen regulären Ausdruck der eine Warnung ausgibt falls er
auf den Inhalt einer geprüften URL zutrifft. Dies gilt nur für gültige
Seiten deren Inhalt wir bekommen können.
.br
Benutzen Sie dies, um nach Seiten zu suchen, welche bestimmte Fehler
enthalten, zum Beispiel "Diese Seite ist umgezogen" oder "Oracle
Applikationsfehler".
.br
Man beachte, dass mehrere Werte in dem regulären Ausdruck kombiniert
werden können, zum Beispiel "(Diese Seite ist umgezogen|Oracle
Applikationsfehler)".
.br
Siehe Abschnitt \fBREGULAR EXPRESSIONS\fP für weitere Infos.
.SS "Optionen zum Prüfen"
.TP
\fB\-\-cookiefile=\fP\fIDATEINAME\fP
Lese eine Datei mit Cookie\-Daten. Das Cookie Datenformat wird weiter unten
erklärt.
.TP
\fB\-\-check\-extern\fP
Check external URLs.
.TP
\fB\-\-ignore\-url=\fP\fIREGEX\fP
URLs welche dem angegebenen regulären Ausdruck entsprechen werden ignoriert
und nicht geprüft.
.br
Diese Option kann mehrmals angegeben werden.
.br
Siehe Abschnitt \fBREGULAR EXPRESSIONS\fP für weitere Infos.
.TP
\fB\-N\fP\fINAME\fP, \fB\-\-nntp\-server=\fP\fINAME\fP
Gibt ein NNTP Rechner für \fBnews:\fP Links. Standard ist die Umgebungsvariable
\fBNNTP_SERVER\fP. Falls kein Rechner angegeben ist, wird lediglich auf
korrekte Syntax des Links geprüft.
.TP
\fB\-\-no\-follow\-url=\fP\fIREGEX\fP
Prüfe URLs, welche dem angegebenen regulären Ausdruck entsprechen, aber
führe keine Rekursion durch.
.br
Diese Option kann mehrmals angegeben werden.
.br
Siehe Abschnitt \fBREGULAR EXPRESSIONS\fP für weitere Infos.
.TP
\fB\-p\fP, \fB\-\-password\fP
Liest ein Passwort von der Kommandozeile und verwende es für HTTP und FTP
Autorisierung. Für FTP ist das Standardpasswort \fBanonymous@\fP. Für HTTP gibt
es kein Standardpasswort. Siehe auch \fB\-u\fP.
.TP
\fB\-r\fP\fINUMMER\fP, \fB\-\-recursion\-level=\fP\fINUMMER\fP
Prüfe rekursiv alle URLs bis zu der angegebenen Tiefe. Eine negative Tiefe
bewirkt unendliche Rekursion. Standard Tiefe ist unendlich.
.TP
\fB\-\-timeout=\fP\fINUMMER\fP
Setze den Timeout für TCP\-Verbindungen in Sekunden. Der Standard Timeout ist
60 Sekunden.
.TP
\fB\-u\fP\fINAME\fP, \fB\-\-user=\fP\fINAME\fP
Verwende den angegebenen Benutzernamen für HTTP und FTP Autorisierung. Für
FTP ist der Standardname \fBanonymous\fP. Für HTTP gibt es keinen
Standardnamen. Siehe auch \fB\-p\fP.
.TP
\fB\-\-user\-agent=\fP\fISTRING\fP
Gibt den User\-Agent an, der zu HTTP\-Servern geschickt wird,
z.B. "Mozilla/4.0". Der Standard ist "LinkChecker/X.Y", wobei X.Y die
aktuelle Version von LinkChecker ist.
.SH KONFIGURATIONSDATEIEN
Konfigurationsdateien können alle obigen Optionen enthalten. Sie können
zudem Optionen enthalten, welche nicht auf der Kommandozeile gesetzt werden
können. Siehe \fBlinkcheckerrc\fP(5) für mehr Informationen.
.SH AUSGABETYPEN
Beachten Sie, dass standardmäßig nur Fehler und Warnungen protokolliert
werden. Sie sollten die \fB\-\-verbose\fP Option benutzen, um eine komplette URL
Liste zu erhalten, besonders bei Ausgabe eines Sitemap\-Graphen.
.TP
\fBtext\fP
Standard Textausgabe in "Schlüssel: Wert"\-Form.
.TP
\fBhtml\fP
Gebe URLs in "Schlüssel: Wert"\-Form als HTML formatiert aus. Besitzt zudem
Verknüpfungen auf die referenzierten Seiten. Ungültige URLs haben
Verknüpfungen zur HTML und CSS Syntaxprüfung angehängt.
.TP
\fBcsv\fP
Gebe Prüfresultat in CSV\-Format aus mit einer URL pro Zeile.
.TP
\fBgml\fP
Gebe Vater\-Kind Beziehungen zwischen verknüpften URLs als GML Graphen aus.
.TP
\fBdot\fP
Gebe Vater\-Kind Beziehungen zwischen verknüpften URLs als DOT Graphen aus.
.TP
\fBgxml\fP
Gebe Prüfresultat als GraphXML\-Datei aus.
.TP
\fBxml\fP
Gebe Prüfresultat als maschinenlesbare XML\-Datei aus.
.TP
\fBsitemap\fP
Protokolliere Prüfergebnisse als XML Sitemap dessen Format unter
.UR https://www.sitemaps.org/protocol.html
.UE .
dokumentiert ist.
.TP
\fBsql\fP
Gebe Prüfresultat als SQL Skript mit INSERT Befehlen aus. Ein
Beispielskript, um die initiale SQL Tabelle zu erstellen ist unter
create.sql zu finden.
.TP
\fBblacklist\fP
Für Cronjobs geeignet. Gibt das Prüfergebnis in eine Datei
\fB~/.linkchecker/blacklist\fP aus, welche nur Einträge mit fehlerhaften URLs
und die Anzahl der Fehlversuche enthält.
.TP
\fBnone\fP
Gibt nichts aus. Für Debugging oder Prüfen des Rückgabewerts geeignet.
.
.SH "REGULÄRE AUSDRÜCKE"
LinkChecker akzeptiert Pythons reguläre Ausdrücke. Siehe
.UR https://docs.python.org/howto/regex.html
.UE
für eine Einführung.
Eine Ergänzung ist, dass ein regulärer Ausdruck negiert wird falls er mit
einem Ausrufezeichen beginnt.
.
.SH COOKIE\-DATEIEN
Eine Cookie\-Datei enthält Standard HTTP\-Header (RFC 2616) mit den folgenden
möglichen Namen:
.
.TP
\fBHost\fP (erforderlich)
Setzt die Domäne für die die Cookies gültig sind.
.TP
\fBPath\fP (optional)
Gibt den Pfad für den die Cookies gültig sind; Standardpfad ist \fB/\fP.
.TP
\fBSet\-cookie\fP (erforderlich)
Setzt den Cookie Name/Wert. Kann mehrmals angegeben werden.
.PP
Mehrere Einträge sind durch eine Leerzeile zu trennen.
.
Das untige Beispiel sendet zwei Cookies zu allen URLs die mit
\fBhttp://example.org/hello/\fP beginnen, und eins zu allen URLs die mit
\fBhttps://example.org\fP beginnen:
.EX
Host: example.com
Path: /hello
Set\-cookie: ID="smee"
Set\-cookie: spam="egg"
.PP
Host: example.org
Set\-cookie: baggage="elitist"; comment="hologram"
.EE
.SH "PROXY UNTERSTÜTZUNG"
Um einen Proxy unter Unix oder Windows zu benutzen, setzen Sie die
$http_proxy, $https_proxy oder $ftp_proxy Umgebungsvariablen auf die Proxy
URL. Die URL sollte die Form
\fBhttp://\fP[\fIuser\fP\fB:\fP\fIpass\fP\fB@\fP]\fIhost\fP[\fB:\fP\fIport\fP] besitzen. LinkChecker
erkennt auch die Proxy\-Einstellungen des Internet Explorers auf einem
Windows\-System, und GNOME oder KDE auf Linux Systemen. Auf einem Mac
benutzen Sie die Internet Konfiguration.
.PP
Sie können eine komma\-separierte Liste von Domainnamen in der $no_proxy
Umgebungsvariable setzen, um alle Proxies für diese Domainnamen zu
ignorieren.
.TP
Einen HTTP\-Proxy unter Unix anzugeben sieht beispielsweise so aus:
\fBexport http_proxy="http://proxy.example.com:8080"\fP
.TP
Proxy\-Authentifizierung wird ebenfalls unterstützt:
\fBexport http_proxy="http://user1:mypass@proxy.example.org:8081"\fP
.TP
Setzen eines Proxies unter der Windows Befehlszeile:
\fBset http_proxy=http://proxy.example.com:8080\fP
.SH "Durchgeführte Prüfungen"
Alle URLs müssen einen ersten Syntaxtest bestehen. Kleine Kodierungsfehler
ergeben eine Warnung, jede andere ungültige Syntaxfehler sind Fehler. Nach
dem Bestehen des Syntaxtests wird die URL in die Schlange zum
Verbindungstest gestellt. Alle Verbindungstests sind weiter unten
beschrieben.
.TP
HTTP Verknüpfungen (\fBhttp:\fP, \fBhttps:\fP)
After connecting to the given HTTP server the given path or query is
requested. All redirections are followed, and if user/password is given it
will be used as authorization when necessary. All final HTTP status codes
other than 2xx are errors.
.IP
Der Inhalt von HTML\-Seiten wird rekursiv geprüft.
.TP
Lokale Dateien (\fBfile:\fP)
Eine reguläre, lesbare Datei die geöffnet werden kann ist gültig. Ein
lesbares Verzeichnis ist ebenfalls gültig. Alle anderen Dateien, zum
Beispiel Gerätedateien, unlesbare oder nicht existente Dateien ergeben einen
Fehler.
.IP
HTML\- oder andere untersuchbare Dateiinhalte werden rekursiv geprüft.
.TP
Mail\-Links (\fBmailto:\fP)
Ein mailto:\-Link ergibt eine Liste von E\-Mail\-Adressen. Falls eine Adresse
fehlerhaft ist, wird die ganze Liste als fehlerhaft angesehen. Für jede
E\-Mail\-Adresse werden die folgenden Dinge geprüft:
.br
1) Check the adress syntax, both of the part before and after the @ sign.
.br
2) Look up the MX DNS records. If we found no MX record, print an error.
.br
3) Check if one of the mail hosts accept an SMTP connection. Check hosts
with higher priority first. If no host accepts SMTP, we print a warning.
.br
4) Try to verify the address with the VRFY command. If we got an answer,
print the verified address as an info.
.TP
FTP\-Links (\fBftp:\fP)
For FTP links we do:
.br
1) Eine Verbindung zum angegeben Rechner wird aufgebaut
.br
2) Versuche, sich mit dem gegebenen Nutzer und Passwort anzumelden. Der
Standardbenutzer ist \*(lqanonymous\*(lq, das Standardpasswort ist \*(lqanonymous@\*(lq.
.br
3) Versuche, in das angegebene Verzeichnis zu wechseln
.br
4) Liste die Dateien im Verzeichnis auf mit dem NLST\-Befehl
.TP
Telnet links (\fBtelnet:\fP)
Versuche, zu dem angegeben Telnetrechner zu verginden und falls
Benutzer/Passwort angegeben sind, wird versucht, sich anzumelden.
.TP
NNTP links (\fBnews:\fP, \fBsnews:\fP, \fBnntp\fP)
Versuche, zu dem angegebenen NNTP\-Rechner eine Verbindung aufzubaucne. Falls
eine Nachrichtengruppe oder ein bestimmter Artikel angegeben ist, wird
versucht, diese Gruppe oder diesen Artikel vom Rechner anzufragen.
.TP
Nicht unterstützte Links (\fBjavascript:\fP, etc.)
Ein nicht unterstützter Link wird nur eine Warnung ausgeben. Weitere
Prüfungen werden nicht durchgeführt.
.IP
The complete list of recognized, but unsupported links can be found in the
.UR https://github.com/linkchecker/linkchecker/blob/master/linkcheck/checker/unknownurl.py
linkcheck/checker/unknownurl.py
.UE
source file. The most prominent of
them should be JavaScript links.
.SH PLUGINS
There are two plugin types: connection and content plugins. Connection
plugins are run after a successful connection to the URL host. Content
plugins are run if the URL type has content (mailto: URLs have no content
for example) and if the check is not forbidden (ie. by HTTP robots.txt).
.PP
See \fBlinkchecker \-\-list\-plugins\fP for a list of plugins and their
documentation. All plugins are enabled via the \fBlinkcheckerrc\fP(5)
configuration file.
.SH Rekursion
Bevor eine URL rekursiv geprüft wird, hat diese mehrere Bedingungen zu
erfüllen. Diese werden in folgender Reihenfolge geprüft:
1. Eine URL muss gültig sein.
2. Der URL\-Inhalt muss analysierbar sein. Dies beinhaltet zur Zeit HTML\-Dateien, Opera Lesezeichen, und Verzeichnisse. Falls ein Dateityp nicht erkannt wird, (zum Beispiel weil er keine bekannte HTML\-Dateierweiterung besitzt, und der Inhalt nicht nach HTML aussieht), wird der Inhalt als nicht analysierbar angesehen.
3. Der URL\-Inhalt muss ladbar sein. Dies ist normalerweise der Fall, mit Ausnahme von mailto: oder unbekannten URL\-Typen.
4. Die maximale Rekursionstiefe darf nicht überschritten werden. Diese wird mit der Option \fB\-\-recursion\-level\fP konfiguriert und ist standardmäßig nicht limitiert.
5. Die URL darf nicht in der Liste von ignorierten URLs sein. Die ignorierten URLs werden mit der Option \fB\-\-ignore\-url\fP konfiguriert.
6. Das Robots Exclusion Protocol muss es erlauben, dass Verknüpfungen in der URL rekursiv verfolgt werden können. Dies wird geprüft, indem in den HTML Kopfdaten nach der "nofollow"\-Direktive gesucht wird.
Beachten Sie, dass die Verzeichnisrekursion alle Dateien in diesem
Verzeichnis liest, nicht nur eine Untermenge wie bspw. \fBindex.html*\fP.
.SH BEMERKUNGEN
URLs von der Kommandozeile die mit \fBftp.\fP beginnen werden wie \fBftp://ftp.\fP
behandelt, URLs die mit \fBwww.\fP beginnen wie \fBhttp://www.\fP. Sie können auch
lokale Dateien angeben.
Falls sich Ihr System automatisch mit dem Internet verbindet (z.B. mit
diald), wird es dies tun wenn Sie Links prüfen, die nicht auf Ihren lokalen
Rechner verweisen Benutzen Sie die Option \fB\-\-ignore\-url\fP, um dies zu
verhindern.
Javascript Links werden nicht unterstützt.
Wenn Ihr System keine Threads unterstützt, deaktiviert diese LinkChecker
automatisch.
Sie können mehrere Benutzer/Passwort Paare in einer Konfigurationsdatei
angeben.
Beim Prüfen von \fBnews:\fP Links muß der angegebene NNTP Rechner nicht
unbedingt derselbe wie der des Benutzers sein.
.
.SH UMGEBUNG
\fBNNTP_SERVER\fP \- gibt Standard NNTP Server an
.br
\fBhttp_proxy\fP \- gibt Standard HTTP Proxy an
.br
\fBftp_proxy\fP \- gibt Standard FTP Proxy an
.br
\fBno_proxy\fP \- kommaseparierte Liste von Domains, die nicht über einen
Proxy\-Server kontaktiert werden
.br
\fBLC_MESSAGES\fP, \fBLANG\fP, \fBLANGUAGE\fP \- gibt Ausgabesprache an
.
.SH RÜCKGABEWERT
Der Rückgabewert ist 2 falls
.IP \(bu 2
ein Programmfehler aufgetreten ist.
.PP
Der Rückgabewert ist 1 falls
.IP \(bu 2
ungültige Verknüpfungen gefunden wurden oder
.IP \(bu
Warnungen gefunden wurden und Warnungen aktiviert sind
.PP
Sonst ist der Rückgabewert Null.
.
.SH LIMITIERUNGEN
LinkChecker benutzt Hauptspeicher für jede zu prüfende URL, die in der
Warteschlange steht. Mit tausenden solcher URLs kann die Menge des benutzten
Hauptspeichers sehr groß werden. Dies könnte das Programm oder sogar das
gesamte System verlangsamen.
.
.SH DATEIEN
\fB~/.linkchecker/linkcheckerrc\fP \- Standardkonfigurationsdatei
.br
\fB~/.linkchecker/blacklist\fP \- Standard Dateiname der blacklist Logger
Ausgabe
.br
\fBlinkchecker\-out.\fP\fITYP\fP \- Standard Dateiname der Logausgabe
.br
.UR https://docs.python.org/library/codecs.html#standard\-encodings
.UE
\- gültige Ausgabe Enkodierungen
.br
.UR https://docs.python.org/howto/regex.html
.UE
\- Dokumentation zu
regulären Ausdrücken
.SH "SIEHE AUCH"
\fBlinkcheckerrc\fP(5)
.
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>
.
.SH COPYRIGHT
Copyright \(co 2000\-2014 Bastian Kleineidam

View file

@ -1,572 +0,0 @@
.\"*******************************************************************
.\"
.\" This file was generated with po4a. Translate the source file.
.\"
.\"*******************************************************************
.TH LINKCHECKERRC 5 2020\-04\-24 LinkChecker "LinkChecker User Manual"
.SH NAME
linkcheckerrc \- Konfigurationsdatei für LinkChecker
.
.SH BESCHREIBUNG
\fBlinkcheckerrc\fP ist die Konfigurationsdatei für LinkChecker. Die Datei ist
in einem INI\-Format geschrieben.
.br
Die Standarddatei ist \fB~/.linkchecker/linkcheckerrc\fP unter Unix\-,
\fB%HOMEPATH%\e.linkchecker\elinkcheckerrc\fP unter Windows\-Systemen.
.SH EIGENSCHAFTEN
.SS [checking]
.TP
\fBcookiefile=\fP\fIDateiname\fP
Lese eine Datei mit Cookie\-Daten. Das Cookie Datenformat wird in
\fBlinkchecker\fP(1) erklärt.
.br
Kommandozeilenoption: \fB\-\-cookiefile\fP
.TP
\fBlocalwebroot=\fP\fISTRING\fP
Beim Prüfen von absoluten URLs in lokalen Dateien wird das angegebene
Wurzelverzeichnis als Basis\-URL benutzt.
.br
Beachten Sie dass das angegebene Verzeichnis in URL\-Syntax sein muss,
d.h. es muss einen normalen statt einen umgekehrten Schrägstrich zum
Aneinanderfügen von Verzeichnissen benutzen. Und das angegebene Verzeichnis
muss mit einem Schrägstrich enden.
.br
Kommandozeilenoption: keine
.TP
\fBnntpserver=\fP\fISTRING\fP
Gibt ein NNTP Rechner für \fBnews:\fP Links. Standard ist die Umgebungsvariable
\fBNNTP_SERVER\fP. Falls kein Rechner angegeben ist, wird lediglich auf
korrekte Syntax des Links geprüft.
.br
Kommandozeilenoption: \fB\-\-nntp\-server\fP
.TP
\fBrecursionlevel=\fP\fINUMBER\fP
Prüfe rekursiv alle URLs bis zu der angegebenen Tiefe. Eine negative Tiefe
bewirkt unendliche Rekursion. Standard Tiefe ist unendlich.
.br
Kommandozeilenoption: \fB\-\-recursion\-level\fP
.TP
\fBthreads=\fP\fINUMBER\fP
Generiere nicht mehr als die angegebene Anzahl von Threads. Die
Standardanzahl von Threads ist 10. Um Threads zu deaktivieren, geben Sie
eine nicht positive Nummer an.
.br
Kommandozeilenoption: \fB\-\-threads\fP
.TP
\fBtimeout=\fP\fINUMMER\fP
Setze den Timeout für TCP\-Verbindungen in Sekunden. Der Standard Timeout ist
60 Sekunden.
.br
Kommandozeilenoption: \fB\-\-timeout\fP
.TP
\fBaborttimeout=\fP\fINUMMER\fP
Time to wait for checks to finish after the user aborts the first time (with
Ctrl\-C or the abort button). The default abort timeout is 300 seconds.
.br
Kommandozeilenoption: \fB\-\-timeout\fP
.TP
\fBuseragent=\fP\fISTRING\fP
Gibt den User\-Agent an, der zu HTTP\-Servern geschickt wird,
z.B. "Mozilla/4.0". Der Standard ist "LinkChecker/X.Y", wobei X.Y die
aktuelle Version von LinkChecker ist.
.br
Kommandozeilenoption: \fB\-\-user\-agent\fP
.TP
\fBsslverify=\fP[\fB0\fP|\fB1\fP|\fIdateiname\fP]
Falls der Wert Null ist werden SSL Zertifikate nicht überprüft. Falls er auf
Eins gesetzt wird (der Standard) werden SSL Zertifikate mit der gelieferten
CA Zertifikatsdatei geprüft. Falls ein Dateiname angegeben ist wird dieser
zur Prüfung verwendet.
.br
Kommandozeilenoption: keine
.TP
\fBmaxrunseconds=\fP\fINUMBER\fP
Hört nach der angegebenen Anzahl von Sekunden auf, neue URLs zu prüfen. Dies
ist dasselbe als wenn der Benutzer nach der gegebenen Anzahl von Sekunden
stoppt (durch Drücken von Strg\-C).
.br
Standard ist nicht zu stoppen bis alle URLs geprüft sind.
.br
Kommandozeilenoption: keine
.TP
\fBmaxnumurls=\fP\fINUMBER\fP
Maximale Anzahl von URLs die geprüft werden. Neue URLs werden nicht
angenommen nachdem die angegebene Anzahl von URLs geprüft wurde.
.br
Standard ist alle URLs anzunehmen und zu prüfen.
.br
Kommandozeilenoption: keine
.TP
\fBmaxrequestspersecond=\fP\fINUMMER\fP
Limit the maximum number of requests per second to one host.
.TP
\fBallowedschemes=\fP\fINAME\fP[\fB,\fP\fINAME\fP...]
Allowed URL schemes as comma\-separated list.
.SS [filtering]
.TP
\fBignore=\fP\fIREGEX\fP (MULTILINE)
Prüfe lediglich die Syntax von URLs, welche dem angegebenen regulären
Ausdruck entsprechen.
.br
Kommandozeilenoption: \fB\-\-ignore\-url\fP
.TP
\fBignorewarnings=\fP\fINAME\fP[\fB,\fP\fINAME\fP...]
Ignore the comma\-separated list of warnings. See \fBWARNINGS\fP for the list of
supported warnings.
.br
Kommandozeilenoption: keine
.TP
\fBinternlinks=\fP\fIREGEX\fP
Regulärer Ausdruck, um mehr URLs als interne Verknüpfungen
hinzuzufügen. Standard ist dass URLs der Kommandozeile als intern gelten.
.br
Kommandozeilenoption: keine
.TP
\fBnofollow=\fP\fIREGEX\fP (MULTILINE)
Prüfe URLs die auf den regulären Ausdruck zutreffen, aber führe keine
Rekursion durch.
.br
Kommandozeilenoption: \fB\-\-no\-follow\-url\fP
.TP
\fBcheckextern=\fP[\fB0\fP|\fB1\fP]
Check external links. Default is to check internal links only.
.br
Kommandozeilenoption: \fB\-\-checkextern\fP
.SS [authentication]
.TP
\fBentry=\fP\fIREGEX\fP \fIBENUTZER\fP [\fIPASSWORT\fP] (MULTILINE)
Erstelle verschiedene Benutzer/Passwort\-Paare für verschiedene
Verknüpfungsarten. Einträge sind ein Tripel (URL regulärer Ausdruck,
Benutzername, Passwort) oder ein Tupel (URL regulärer Ausdruck,
Benutzername), wobei die Einträge durch Leerzeichen getrennt sind.
.br
Das Passwort ist optional und muss falls es fehlt auf der Kommandozeile
eingegeben werden.
.br
Falls der reguläre Ausdruck auf die zu prüfende URL zutrifft, wird das
angegebene Benutzer/Passwort\-Paar zum Authentifizieren genutzt. Die
Kommandozeilenoptionen \fB\-u\fP und \fB\-p\fP treffen auf jede Verknüpfung zu und
überschreiben daher die Einträge hier. Der erste Treffer gewinnt. Im Moment
wird Authentifizierung für http[s] und ftp Verknüpfungen benutzt.
.br
Kommandozeilenoption: \fB\-u\fP, \fB\-p\fP
.TP
\fBloginurl=\fP\fIURL\fP
A login URL to be visited before checking. Also needs authentication data
set for it.
.TP
\fBloginuserfield=\fP\fINAME\fP
Der Name für das Benutzer CGI\-Feld. Der Standardname ist \fBlogin\fP.
.TP
\fBloginpasswordfield=\fP\fINAME\fP
Der Name für das Passwort CGI\-Feld. Der Standardname ist \fBpassword\fP.
.TP
\fBloginextrafields=\fP\fINAME\fP\fB:\fP\fIWERT\fP (MULTILINE)
Optional zusätzliche CGI Namen/Werte\-Paare. Die Default\-Werte werden
automatisch übermittelt.
.SS [output]
.TP
\fBdebug=\fP\fISTRING\fP[\fB,\fP\fISTRING\fP...]
Print debugging output for the given modules. Available debug modules are
\fBcmdline\fP, \fBchecking\fP, \fBcache\fP, \fBdns\fP, \fBthread\fP, \fBplugins\fP and
\fBall\fP. Specifying \fBall\fP is an alias for specifying all available loggers.
.br
[output]
.TP
\fBfileoutput=\fP\fITYPE\fP[\fB,\fP\fITYPE\fP...]
Ausgabe in Datei \fBlinkchecker\-out.\fP\fITYPE\fP, \fB$HOME/.linkchecker/blacklist\fP
für \fBblacklist\fP Ausgabe.
.br
Gültige Ausgabearten sind \fBtext\fP, \fBhtml\fP, \fBsql\fP, \fBcsv\fP, \fBgml\fP, \fBdot\fP,
\fBxml\fP, \fBnone\fP oder \fBblacklist\fP Standard ist keine Dateiausgabe. Die
verschiedenen Ausgabearten sind unten dokumentiert. Bemerke, dass man alle
Konsolenausgaben mit \fBoutput=none\fP unterdrücken kann.
.br
Kommandozeilenoption: \fB\-\-file\-output\fP
.TP
\fBlog=\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP]
Gib Ausgabetyp als \fBtext\fP, \fBhtml\fP, \fBsql\fP, \fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP,
\fBnone\fP oder \fBblacklist\fP an. Stadard Typ ist \fBtext\fP. Die verschiedenen
Ausgabetypen sind unten dokumentiert.
.br
Das \fIENCODING\fP gibt die Ausgabekodierung an. Der Standard ist das der
lokalen Spracheinstellung. Gültige Enkodierungen sind aufgelistet unter
.UR https://docs.python.org/library/codecs.html#standard\-encodings
.UE .
.br
Kommandozeilenoption: \fB\-\-output\fP
.TP
\fBquiet=\fP[\fB0\fP|\fB1\fP]
Falls gesetzt, erfolgt keine Ausgabe. Ein Alias für \fBlog=none\fP. Dies ist
nur in Verbindung mit \fBfileoutput\fP nützlich.
.br
Kommandozeilenoption: \fB\-\-verbose\fP
.TP
\fBstatus=\fP[\fB0\fP|\fB1\fP]
Kontrolle der Statusmeldungen. Standard ist 1.
.br
Kommandozeilenoption: \fB\-\-no\-status\fP
.TP
\fBverbose=\fP[\fB0\fP|\fB1\fP]
Falls gesetzt, gebe alle geprüften URLs einmal aus. Standard ist es, nur
fehlerhafte URLs und Warnungen auszugeben.
.br
Kommandozeilenoption: \fB\-\-verbose\fP
.TP
\fBwarnings=\fP[\fB0\fP|\fB1\fP]
Falls gesetzt, gebe keine Warnungen aus. Standard ist die Ausgabe von
Warnungen.
.br
Kommandozeilenoption: \fB\-\-no\-warnings\fP
.SS [text]
.TP
\fBfilename=\fP\fISTRING\fP
Gebe Dateiname für Textausgabe an. Standard Dateiname ist
\fBlinkchecker\-out.txt\fP.
.br
Kommandozeilenoption: \fB\-\-file\-output=\fP
.TP
\fBparts=\fP\fISTRING\fP
Kommagetrennte Liste von Teilen, die ausgegeben werden sollen. Siehe
\fBLOGGER PART\fP weiter unten.
.br
Kommandozeilenoption: keine
.TP
\fBencoding=\fP\fISTRING\fP
Gültige Enkodierungen sind aufgelistet unter
.UR https://docs.python.org/library/codecs.html#standard\-encodings
.UE .
.br
Die Standardenkodierung ist \fBiso\-8859\-15\fP.
.TP
\fIcolor*\fP
Farbwerte für die verschiedenen Ausgabeteile. Syntax ist \fIcolor\fP oder
\fItype\fP\fB;\fP\fIcolor\fP. Der \fItype\fP kann \fBbold\fP, \fBlight\fP, \fBblink\fP oder
\fBinvert\fP sein. Die \fIcolor\fP kann \fBdefault\fP, \fBblack\fP, \fBred\fP, \fBgreen\fP,
\fByellow\fP, \fBblue\fP, \fBpurple\fP, \fBcyan\fP, \fBwhite\fP, \fBBlack\fP, \fBRed\fP,
\fBGreen\fP, \fBYellow\fP, \fBBlue\fP, \fBPurple\fP, \fBCyan\fP oder \fBWhite\fP sein.
.br
Kommandozeilenoption: keine
.TP
\fBcolorparent=\fP\fISTRING\fP
Setze Farbe des Vaters. Standard ist \fBwhite\fP.
.TP
\fBcolorurl=\fP\fISTRING\fP
Setze URL Farbe. Standard ist \fBdefault\fP.
.TP
\fBcolorname=\fP\fISTRING\fP
Kommandozeilenoption: \fB\-\-file\-output=\fP
.TP
\fBcolorreal=\fP\fISTRING\fP
Setze Farbe für tatsächliche URL. Default ist \fBcyan\fP.
.TP
\fBcolorbase=\fP\fISTRING\fP
Setzt Basisurl Farbe. Standard ist \fBpurple\fP.
.TP
\fBcolorvalid=\fP\fISTRING\fP
Setze gültige Farbe. Standard ist \fBbold;green\fP.
.TP
\fBcolorinvalid=\fP\fISTRING\fP
Setze ungültige Farbe. Standard ist \fBbold;red\fP.
.TP
\fBcolorinfo=\fP\fISTRING\fP
Setzt Informationsfarbe. Standard ist \fBdefault\fP.
.TP
\fBcolorwarning=\fP\fISTRING\fP
Setze Warnfarbe. Standard ist \fBbold;yellow\fP.
.TP
\fBcolordltime=\fP\fISTRING\fP
Setze Downloadzeitfarbe. Standard ist \fBdefault\fP.
.TP
\fBcolorreset=\fP\fISTRING\fP
Set reset color. Default is \fBdefault\fP.
.SS [gml]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.SS [dot]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.SS [csv]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBseparator=\fP\fICHAR\fP
Das CSV Trennzeichen. Standard ist Komma (\fB,\fP).
.TP
\fBquotechar=\fP\fICHAR\fP
Setze CSV Quotezeichen. Standard ist das doppelte Anführungszeichen (\fB"\fP).
.SS [sql]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBdbname=\fP\fISTRING\fP
Setze Datenbankname zum Speichern. Standard ist \fBlinksdb\fP.
.TP
\fBseparator=\fP\fICHAR\fP
Setze SQL Kommandotrennzeichen. Standard ist ein Strichpunkt (\fB;\fP).
.SS [html]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBcolorbackground=\fP\fICOLOR\fP
Setze Reset Farbe. Standard ist \fBdefault\fP.
.TP
\fBcolorurl=\fP
Setze HTML URL Farbe. Standard ist \fB#dcd5cf\fP.
.TP
\fBcolorborder=\fP
Setze HTML Rahmenfarbe. Standard ist \fB#000000\fP.
.TP
\fBcolorlink=\fP
Setze HTML Verknüpfungsfarbe. Standard ist \fB#191c83\fP.
.TP
\fBcolorwarning=\fP
Setze HTML Warnfarbe. Standard ist \fB#e0954e\fP.
.TP
\fBcolorerror=\fP
Setze HTML Fehlerfarbe. Standard ist \fB#db4930\fP.
.TP
\fBcolorok=\fP
Setze HTML Gültigkeitsfarbe. Standard ist \fB#3ba557\fP.
.SS [blacklist]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.SS [xml]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.SS [gxml]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.SS [sitemap]
.TP
\fBfilename=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe [text] Sektion weiter oben.
.TP
\fBpriority=\fP\fINUMMER\fP
Eine Nummer zwischen 0.0 und 1.0, welche die Priorität festlegt. Die
Standardpriorität für die erste URL ist 1.0, für alle Kind\-URLs ist sie 0.5.
.TP
\fBfrequency=\fP[\fBalways\fP|\fBhourly\fP|\fBdaily\fP|\fBweekly\fP|\fBmonthly\fP|\fByearly\fP|\fBnever\fP]
Die Häufigkeit mit der Seiten sich ändern.
.
.SH "AUSGABE PARTS"
.TS
nokeep, tab(@);
ll.
\fBall\fP@(for all parts)
\fBid\fP@(a unique ID for each logentry)
\fBrealurl\fP@(the full url link)
\fBresult\fP@(valid or invalid, with messages)
\fBextern\fP@(1 or 0, only in some logger types reported)
\fBbase\fP@(base href=...)
\fBname\fP@(<a href=...>name</a> and <img alt="name">)
\fBparenturl\fP@(if any)
\fBinfo\fP@(some additional info, e.g. FTP welcome messages)
\fBwarning\fP@(warnings)
\fBdltime\fP@(download time)
\fBchecktime\fP@(check time)
\fBurl\fP@(the original url name, can be relative)
\fBintro\fP@(the blurb at the beginning, "starting at ...")
\fBoutro\fP@(the blurb at the end, "found x errors ...")
.TE
.SH MULTILINE
Einige Optionen können mehrere Zeilen lang sein. Jede Zeile muss dafür
eingerückt werden. Zeilen die mit einer Raute (\fB#\fP) beginnen werden
ignoriert, müssen aber eingerückt sein.
.EX
ignore=
lconline
bookmark
# a comment
^mailto:
.EE
.SH BEISPIEL
.EX
[output]
log=html
.PP
[checking]
threads=5
.PP
[filtering]
ignorewarnings=http\-moved\-permanent
.EE
.SH PLUGINS
All plugins have a separate section. If the section appears in the
configuration file the plugin is enabled. Some plugins read extra options
in their section.
.SS [AnchorCheck]
Checks validity of HTML anchors.
.SS [LocationInfo]
Adds the country and if possible city name of the URL host as info. Needs
GeoIP or pygeoip and a local country or city lookup DB installed.
.SS [RegexCheck]
Define a regular expression which prints a warning if it matches any content
of the checked link. This applies only to valid pages, so we can get their
content.
.TP
\fBwarningregex=\fP\fIREGEX\fP
Use this to check for pages that contain some form of error message, for
example "This page has moved" or "Oracle Application error". \fIREGEX\fP should
be unquoted.
Man beachte, dass mehrere Werte in dem regulären Ausdruck kombiniert
werden können, zum Beispiel "(Diese Seite ist umgezogen|Oracle
Applikationsfehler)".
.SS [SslCertificateCheck]
Check SSL certificate expiration date. Only internal https: links will be
checked. A domain will only be checked once to avoid duplicate warnings.
.TP
\fBsslcertwarndays=\fP\fINUMBER\fP
Configures the expiration warning time in days.
.SS [HtmlSyntaxCheck]
Check the syntax of HTML pages with the online W3C HTML validator. See
.UR https://validator.w3.org/docs/api.html
.UE .
.SS [HttpHeaderInfo]
Print HTTP headers in URL info.
.TP
\fBprefixes=\fP\fIprefix1\fP[,\fIprefix2\fP]...
List of comma separated header prefixes. For example to display all HTTP
headers that start with "X\-".
.SS [CssSyntaxCheck]
Check the syntax of HTML pages with the online W3C CSS validator. See
.UR https://jigsaw.w3.org/css\-validator/manual.html#expert
.UE .
.SS [VirusCheck]
Checks the page content for virus infections with clamav. A local clamav
daemon must be installed.
.TP
\fBclamavconf=\fP\fIDateiname\fP
Dateiname von \fBclamd.conf\fP Konfigurationsdatei.
.
.SS [PdfParser]
Parse PDF files for URLs to check. Needs the \fBpdfminer\fP Python package
installed.
.SS [WordParser]
Parse Word files for URLs to check. Needs the \fBpywin32\fP Python extension
installed.
.SH WARNUNGEN
Die folgenden Warnungen werden vom Konfigurationseintrag 'ignorewarnings'
erkannt:
.br
.TP
\fBfile\-missing\-slash\fP
Der file: URL fehlt ein abschließender Schrägstrich.
.TP
\fBfile\-system\-path\fP
Der file: Pfad ist nicht derselbe wie der Systempfad.
.TP
\fBftp\-missing\-slash\fP
Der ftp: URL fehlt ein abschließender Schrägstrich.
.TP
\fBhttp\-cookie\-store\-error\fP
Ein Fehler trat auf während des Speicherns eines Cookies.
.TP
\fBhttp\-empty\-content\fP
Die URL besitzt keinen Inhalt.
.TP
\fBmail\-no\-mx\-host\fP
Der MX Mail\-Rechner konnte nicht gefunden werden.
.TP
\fBnntp\-no\-newsgroup\fP
Die NNTP Nachrichtengruppe konnte nicht gefunden werden.
.TP
\fBnntp\-no\-server\fP
Es wurde kein NNTP Server gefunden.
.TP
\fBurl\-content\-size\-zero\fP
Der URL Inhaltsgrößenangabe ist Null.
.TP
\fBurl\-content\-too\-large\fP
Der URL Inhalt ist zu groß.
.TP
\fBurl\-effective\-url\fP
Die effektive URL unterscheidet sich vom Original.
.TP
\fBurl\-error\-getting\-content\fP
Konnte den Inhalt der URL nicht bekommen.
.TP
\fBurl\-obfuscated\-ip\fP
Die IP\-Adresse ist verschleiert.
.TP
\fBurl\-whitespace\fP
Die URL %(url)s enthält Leerzeichen am Anfang oder Ende.
.SH "SIEHE AUCH"
\fBlinkchecker\fP(1)
.
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>
.
.SH COPYRIGHT
Copyright \(co 2000\-2014 Bastian Kleineidam

View file

@ -1,505 +0,0 @@
.TH LINKCHECKER 1 2020-06-05 "LinkChecker" "LinkChecker User Manual"
.SH NAME
linkchecker \- command line client to check HTML documents and websites for broken links
.SH SYNOPSIS
.B linkchecker
.RI [ options ]
.RI [ file-or-url ]...
.SH DESCRIPTION
.TP 2
LinkChecker features
.IP \(bu
recursive and multithreaded checking,
.IP \(bu
output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats,
.IP \(bu
support for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links,
.IP \(bu
restriction of link checking with URL filters,
.IP \(bu
proxy support,
.IP \(bu
username/password authorization for HTTP, FTP and Telnet,
.IP \(bu
support for robots.txt exclusion protocol,
.IP \(bu
support for Cookies
.IP \(bu
support for HTML5
.IP \(bu
HTML and CSS syntax check
.IP \(bu
Antivirus check
.IP \(bu
a command line and web interface
.SH EXAMPLES
.TP 2
The most common use checks the given domain recursively:
.B linkchecker http://www.example.com/
.br
Beware that this checks the whole site which can have thousands of URLs.
Use the \fB\-r\fP option to restrict the recursion depth.
.TP
Don't check URLs with \fB/secret\fP in its name. All other links are checked as usual:
.B linkchecker \-\-ignore\-url=/secret mysite.example.com
.TP
Checking a local HTML file on Unix:
.B linkchecker ../bla.html
.TP
Checking a local HTML file on Windows:
.B linkchecker c:\\temp\\test.html
.TP
You can skip the \fBhttp://\fP url part if the domain starts with \fBwww.\fP:
.B linkchecker www.example.com
.TP
You can skip the \fBftp://\fP url part if the domain starts with \fBftp.\fP:
.B linkchecker \-r0 ftp.example.com
.TP
Generate a sitemap graph and convert it with the graphviz dot utility:
.B linkchecker \-odot \-v www.example.com | dot \-Tps > sitemap.ps
.SH OPTIONS
.SS General options
.TP
\fB\-f\fP\fIFILENAME\fP, \fB\-\-config=\fP\fIFILENAME\fP
Use \fIFILENAME\fP as configuration file. As default LinkChecker
uses \fB~/.linkchecker/linkcheckerrc\fP.
.TP
\fB\-h\fP, \fB\-\-help\fP
Help me! Print usage information for this program.
.TP
\fB\-\-stdin\fP
Read list of white-space separated URLs to check from stdin.
.TP
\fB\-t\fP\fINUMBER\fP, \fB\-\-threads=\fP\fINUMBER\fP
Generate no more than the given number of threads. Default number
of threads is 10. To disable threading specify a non-positive number.
.TP
\fB\-V\fP, \fB\-\-version\fP
Print version and exit.
.TP
\fB\-\-list\-plugins\fP
Print available check plugins and exit.
.
.SS Output options
.TP
\fB\-D\fP\fISTRING\fP, \fB\-\-debug=\fP\fISTRING\fP
Print debugging output for the given logger.
Available loggers are \fBcmdline\fP, \fBchecking\fP,
\fBcache\fP, \fBdns\fP, \fBplugin\fP and \fBall\fP.
Specifying \fBall\fP is an alias for specifying all available loggers.
The option can be given multiple times to debug with more
than one logger.
.BR
For accurate results, threading will be disabled during debug runs.
.TP
\fB\-F\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP][\fB/\fP\fIFILENAME\fP], \fB\-\-file\-output=\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP][\fB/\fP\fIFILENAME\fP]
Output to a file \fBlinkchecker\-out.\fP\fITYPE\fP,
\fB$HOME/.linkchecker/blacklist\fP for
\fBblacklist\fP output, or \fIFILENAME\fP if specified.
The \fIENCODING\fP specifies the output encoding, the default is
that of your locale.
Valid encodings are listed at
.UR https://docs.python.org/library/codecs.html#standard-encodings
.UE .
.br
The \fIFILENAME\fP and \fIENCODING\fP parts of the \fBnone\fP output type
will be ignored, else if the file already exists, it will be overwritten.
You can specify this option more than once. Valid file output types
are \fBtext\fP, \fBhtml\fP, \fBsql\fP,
\fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBsitemap\fP, \fBnone\fP or
\fBblacklist\fP.
Default is no file output. The various output types are documented
below. Note that you can suppress all console output
with the option \fB\-o none\fP.
.TP
\fB\-\-no\-status\fP
Do not print check status messages.
.TP
\fB\-\-no\-warnings\fP
Don't log warnings. Default is to log warnings.
.TP
\fB\-o\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP], \fB\-\-output=\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP]
Specify output type as \fBtext\fP, \fBhtml\fP, \fBsql\fP,
\fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBsitemap\fP, \fBnone\fP or
\fBblacklist\fP.
Default type is \fBtext\fP. The various output types are documented
below.
.br
The \fIENCODING\fP specifies the output encoding, the default is
that of your locale. Valid encodings are listed at
.UR https://docs.python.org/library/codecs.html#standard-encodings
.UE .
.TP
\fB\-q\fP, \fB\-\-quiet\fP
Quiet operation, an alias for \fB\-o none\fP.
This is only useful with \fB\-F\fP.
.TP
\fB\-v\fP, \fB\-\-verbose\fP
Log all checked URLs. Default is to log only errors and warnings.
.TP
\fB\-W\fP\fIREGEX\fP, \fB\-\-warning\-regex=\fIREGEX\fP
Define a regular expression which prints a warning if it matches any
content of the checked link.
This applies only to valid pages, so we can get their content.
.br
Use this to check for pages that contain some form of error, for example
"This page has moved" or "Oracle Application error".
.br
Note that multiple values can be combined in the regular expression,
for example "(This page has moved|Oracle Application error)".
.br
See section \fBREGULAR EXPRESSIONS\fP for more info.
.SS Checking options
.TP
\fB\-\-cookiefile=\fP\fIFILENAME\fP
Read a file with initial cookie data. The cookie data
format is explained below.
.TP
\fB\-\-check\-extern
Check external URLs.
.TP
\fB\-\-ignore\-url=\fP\fIREGEX\fP
URLs matching the given regular expression will be ignored and not checked.
.br
This option can be given multiple times.
.br
See section \fBREGULAR EXPRESSIONS\fP for more info.
.TP
\fB\-N\fP\fISTRING\fP, \fB\-\-nntp\-server=\fP\fISTRING\fP
Specify an NNTP server for \fBnews:\fP links. Default is the
environment variable \fBNNTP_SERVER\fP. If no host is given,
only the syntax of the link is checked.
.TP
\fB\-\-no\-follow\-url=\fP\fIREGEX\fP
Check but do not recurse into URLs matching the given regular
expression.
.br
This option can be given multiple times.
.br
See section \fBREGULAR EXPRESSIONS\fP for more info.
.TP
\fB\-p\fP, \fB\-\-password\fP
Read a password from console and use it for HTTP and FTP authorization.
For FTP the default password is \fBanonymous@\fP. For HTTP there is
no default password. See also \fB\-u\fP.
.TP
\fB\-r\fP\fINUMBER\fP, \fB\-\-recursion\-level=\fP\fINUMBER\fP
Check recursively all links up to given depth.
A negative depth will enable infinite recursion.
Default depth is infinite.
.TP
\fB\-\-timeout=\fP\fINUMBER\fP
Set the timeout for connection attempts in seconds. The default timeout
is 60 seconds.
.TP
\fB\-u\fP\fISTRING\fP, \fB\-\-user=\fP\fISTRING\fP
Try the given username for HTTP and FTP authorization.
For FTP the default username is \fBanonymous\fP. For HTTP there is
no default username. See also \fB\-p\fP.
.TP
\fB\-\-user\-agent=\fP\fISTRING\fP
Specify the User-Agent string to send to the HTTP server, for example
"Mozilla/4.0". The default is "LinkChecker/X.Y" where X.Y is the current
version of LinkChecker.
.SH "CONFIGURATION FILES"
Configuration files can specify all options above. They can also
specify some options that cannot be set on the command line.
See
.BR linkcheckerrc (5)
for more info.
.SH OUTPUT TYPES
Note that by default only errors and warnings are logged.
You should use the \fB\-\-verbose\fP option to get the complete URL list,
especially when outputting a sitemap graph format.
.TP
\fBtext\fP
Standard text logger, logging URLs in keyword: argument fashion.
.TP
\fBhtml\fP
Log URLs in keyword: argument fashion, formatted as HTML.
Additionally has links to the referenced pages. Invalid URLs have
HTML and CSS syntax check links appended.
.TP
\fBcsv\fP
Log check result in CSV format with one URL per line.
.TP
\fBgml\fP
Log parent-child relations between linked URLs as a GML sitemap graph.
.TP
\fBdot\fP
Log parent-child relations between linked URLs as a DOT sitemap graph.
.TP
\fBgxml\fP
Log check result as a GraphXML sitemap graph.
.TP
\fBxml\fP
Log check result as machine-readable XML.
.TP
\fBsitemap\fP
Log check result as an XML sitemap whose protocol is documented at
.UR https://www.sitemaps.org/protocol.html
.UE .
.TP
\fBsql\fP
Log check result as SQL script with INSERT commands. An example
script to create the initial SQL table is included as create.sql.
.TP
\fBblacklist\fP
Suitable for cron jobs. Logs the check result into a file
\fB~/.linkchecker/blacklist\fP which only contains entries with invalid
URLs and the number of times they have failed.
.TP
\fBnone\fP
Logs nothing. Suitable for debugging or checking the exit code.
.
.SH REGULAR EXPRESSIONS
LinkChecker accepts Python regular expressions.
See
.UR https://docs.python.org/howto/regex.html
.UE
for an introduction.
An addition is that a leading exclamation mark negates the regular
expression.
.
.SH COOKIE FILES
A cookie file contains standard HTTP header (RFC 2616) data with the
following possible names:
.
.TP
\fBHost\fP (required)
Sets the domain the cookies are valid for.
.TP
\fBPath\fP (optional)
Gives the path the cookies are value for; default path is \fB/\fP.
.TP
\fBSet-cookie\fP (required)
Set cookie name/value. Can be given more than once.
.PP
Multiple entries are separated by a blank line.
.
The example below will send two cookies to all URLs starting with
\fBhttp://example.com/hello/\fP and one to all URLs starting
with \fBhttps://example.org/\fP:
.EX
Host: example.com
Path: /hello
Set-cookie: ID="smee"
Set-cookie: spam="egg"
.PP
Host: example.org
Set-cookie: baggage="elitist"; comment="hologram"
.EE
.SH PROXY SUPPORT
To use a proxy on Unix or Windows set the $http_proxy, $https_proxy or $ftp_proxy
environment variables to the proxy URL. The URL should be of the form
\fBhttp://\fP[\fIuser\fP\fB:\fP\fIpass\fP\fB@\fP]\fIhost\fP[\fB:\fP\fIport\fP].
LinkChecker also detects manual proxy settings of Internet Explorer under
Windows systems, and GNOME or KDE on Linux systems.
On a Mac use the Internet Config to select a proxy.
.PP
You can also set a comma-separated domain list in the $no_proxy environment
variables to ignore any proxy settings for these domains.
.TP
Setting a HTTP proxy on Unix for example looks like this:
.B
export http_proxy="http://proxy.example.com:8080"
.TP
Proxy authentication is also supported:
.B
export http_proxy="http://user1:mypass@proxy.example.org:8081"
.TP
Setting a proxy on the Windows command prompt:
.B
set http_proxy=http://proxy.example.com:8080
.SH PERFORMED CHECKS
All URLs have to pass a preliminary syntax test. Minor quoting
mistakes will issue a warning, all other invalid syntax issues
are errors.
After the syntax check passes, the URL is queued for connection
checking. All connection check types are described below.
.TP
HTTP links (\fBhttp:\fP, \fBhttps:\fP)
After connecting to the given HTTP server the given path
or query is requested. All redirections are followed, and
if user/password is given it will be used as authorization
when necessary.
All final HTTP status codes other than 2xx are errors.
.IP
HTML page contents are checked for recursion.
.TP
Local files (\fBfile:\fP)
A regular, readable file that can be opened is valid. A readable
directory is also valid. All other files, for example device files,
unreadable or non-existing files are errors.
.IP
HTML or other parseable file contents are checked for recursion.
.TP
Mail links (\fBmailto:\fP)
A mailto: link eventually resolves to a list of email addresses.
If one address fails, the whole list will fail.
For each mail address we check the following things:
.br
1) Check the adress syntax, both of the part before and after the @ sign.
.br
2) Look up the MX DNS records. If we found no MX record, print an error.
.br
3) Check if one of the mail hosts accept an SMTP connection.
Check hosts with higher priority first.
If no host accepts SMTP, we print a warning.
.br
4) Try to verify the address with the VRFY command. If we got an answer,
print the verified address as an info.
.TP
FTP links (\fBftp:\fP)
For FTP links we do:
.br
1) connect to the specified host
.br
2) try to login with the given user and password. The default
user is \fBanonymous\fP, the default password is \fBanonymous@\fP.
.br
3) try to change to the given directory
.br
4) list the file with the NLST command
.TP
Telnet links (\fBtelnet:\fP)
We try to connect and if user/password are given, login to the
given telnet server.
.TP
NNTP links (\fBnews:\fP, \fBsnews:\fP, \fBnntp\fP)
We try to connect to the given NNTP server. If a news group or
article is specified, try to request it from the server.
.TP
Unsupported links (\fBjavascript:\fP, etc.)
An unsupported link will only print a warning. No further checking
will be made.
.IP
The complete list of recognized, but unsupported links can be found
in the
.UR https://github.com/linkchecker/linkchecker/blob/master/linkcheck/checker/unknownurl.py
linkcheck/checker/unknownurl.py
.UE
source file.
The most prominent of them should be JavaScript links.
.SH PLUGINS
There are two plugin types: connection and content plugins.
Connection plugins are run after a successful connection to the
URL host.
Content plugins are run if the URL type has content
(mailto: URLs have no content for example) and if the check is not
forbidden (ie. by HTTP robots.txt).
.PP
See \fBlinkchecker \-\-list\-plugins\fP for a list of plugins and
their documentation. All plugins are enabled via the
.BR linkcheckerrc (5)
configuration file.
.SH RECURSION
Before descending recursively into a URL, it has to fulfill several
conditions. They are checked in this order:
1. A URL must be valid.
2. A URL must be parseable. This currently includes HTML files,
Opera bookmarks files, and directories. If a file type cannot
be determined (for example it does not have a common HTML file
extension, and the content does not look like HTML), it is assumed
to be non-parseable.
3. The URL content must be retrievable. This is usually the case
except for example mailto: or unknown URL types.
4. The maximum recursion level must not be exceeded. It is configured
with the \fB\-\-recursion\-level\fP option and is unlimited per default.
5. It must not match the ignored URL list. This is controlled with
the \fB\-\-ignore\-url\fP option.
6. The Robots Exclusion Protocol must allow links in the URL to be
followed recursively. This is checked by searching for a
"nofollow" directive in the HTML header data.
Note that the directory recursion reads all files in that
directory, not just a subset like \fBindex.htm*\fP.
.SH NOTES
URLs on the commandline starting with \fBftp.\fP are treated like
\fBftp://ftp.\fP, URLs starting with \fBwww.\fP are treated like
\fBhttp://www.\fP.
You can also give local files as arguments.
If you have your system configured to automatically establish a
connection to the internet (e.g. with diald), it will connect when
checking links not pointing to your local host.
Use the \fB\-\-ignore\-url\fP option to prevent this.
Javascript links are not supported.
If your platform does not support threading, LinkChecker disables it
automatically.
You can supply multiple user/password pairs in a configuration file.
When checking \fBnews:\fP links the given NNTP host doesn't need to be the
same as the host of the user browsing your pages.
.
.SH ENVIRONMENT
\fBNNTP_SERVER\fP - specifies default NNTP server
.br
\fBhttp_proxy\fP - specifies default HTTP proxy server
.br
\fBftp_proxy\fP - specifies default FTP proxy server
.br
\fBno_proxy\fP - comma-separated list of domains to not contact over a proxy server
.br
\fBLC_MESSAGES\fP, \fBLANG\fP, \fBLANGUAGE\fP - specify output language
.
.SH RETURN VALUE
The return value is 2 when
.IP \(bu 2
a program error occurred.
.PP
The return value is 1 when
.IP \(bu 2
invalid links were found or
.IP \(bu
link warnings were found and warnings are enabled
.PP
Else the return value is zero.
.
.SH LIMITATIONS
LinkChecker consumes memory for each queued URL to check. With thousands
of queued URLs the amount of consumed memory can become quite large. This
might slow down the program or even the whole system.
.
.SH FILES
\fB~/.linkchecker/linkcheckerrc\fP - default configuration file
.br
\fB~/.linkchecker/blacklist\fP - default blacklist logger output filename
.br
\fBlinkchecker\-out.\fP\fITYPE\fP - default logger file output name
.br
.UR https://docs.python.org/library/codecs.html#standard-encodings
.UE
\- valid output encodings
.br
.UR https://docs.python.org/howto/regex.html
.UE
\- regular expression documentation
.SH "SEE ALSO"
.BR linkcheckerrc (5)
.
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>
.
.SH COPYRIGHT
Copyright \(co 2000-2014 Bastian Kleineidam

View file

@ -1,577 +0,0 @@
.TH LINKCHECKERRC 5 2020-06-05 "LinkChecker" "LinkChecker User Manual"
.SH NAME
linkcheckerrc - configuration file for LinkChecker
.
.SH DESCRIPTION
\fBlinkcheckerrc\fP is the configuration file for LinkChecker.
The file is written in an INI-style format.
.br
The default file location is \fB~/.linkchecker/linkcheckerrc\fP on Unix,
\fB%HOMEPATH%\\.linkchecker\\linkcheckerrc\fP on Windows systems.
.SH SETTINGS
.SS \fB[checking]\fP
.TP
\fBcookiefile=\fP\fIfilename\fP
Read a file with initial cookie data. The cookie data
format is explained in
.BR linkchecker (1).
.br
Command line option: \fB\-\-cookiefile\fP
.TP
\fBlocalwebroot=\fP\fISTRING\fP
When checking absolute URLs inside local files, the given root directory
is used as base URL.
.br
Note that the given directory must have URL syntax, so it must use a slash
to join directories instead of a backslash.
And the given directory must end with a slash.
.br
Command line option: none
.TP
\fBnntpserver=\fP\fISTRING\fP
Specify an NNTP server for \fBnews:\fP links. Default is the
environment variable \fBNNTP_SERVER\fP. If no host is given,
only the syntax of the link is checked.
.br
Command line option: \fB\-\-nntp\-server\fP
.TP
\fBrecursionlevel=\fP\fINUMBER\fP
Check recursively all links up to given depth.
A negative depth will enable infinite recursion.
Default depth is infinite.
.br
Command line option: \fB\-\-recursion\-level\fP
.TP
\fBthreads=\fP\fINUMBER\fP
Generate no more than the given number of threads. Default number
of threads is 10. To disable threading specify a non-positive number.
.br
Command line option: \fB\-\-threads\fP
.TP
\fBtimeout=\fP\fINUMBER\fP
Set the timeout for connection attempts in seconds. The default timeout
is 60 seconds.
.br
Command line option: \fB\-\-timeout\fP
.TP
\fBaborttimeout=\fP\fINUMBER\fP
Time to wait for checks to finish after the user aborts the first time
(with Ctrl-C or the abort button).
The default abort timeout is 300 seconds.
.br
Command line option: \fB\-\-timeout\fP
.TP
\fBuseragent=\fP\fISTRING\fP
Specify the User-Agent string to send to the HTTP server, for example
"Mozilla/4.0". The default is "LinkChecker/X.Y" where X.Y is the current
version of LinkChecker.
.br
Command line option: \fB\-\-user\-agent\fP
.TP
\fBsslverify=\fP[\fB0\fP|\fB1\fP|\fIfilename\fP]
If set to zero disables SSL certificate checking.
If set to one (the default) enables SSL certificate checking with
the provided CA certificate file. If a filename is specified, it
will be used as the certificate file.
.br
Command line option: none
.TP
\fBmaxrunseconds=\fP\fINUMBER\fP
Stop checking new URLs after the given number of seconds. Same as if the
user stops (by hitting Ctrl-C) after the given number of seconds.
.br
The default is not to stop until all URLs are checked.
.br
Command line option: none
.TP
\fBmaxnumurls=\fP\fINUMBER\fP
Maximum number of URLs to check. New URLs will not be queued after the
given number of URLs is checked.
.br
The default is to queue and check all URLs.
.br
Command line option: none
.TP
\fBmaxrequestspersecond=\fP\fINUMBER\fP
Limit the maximum number of requests per second to one host.
.TP
\fBallowedschemes=\fP\fINAME\fP[\fB,\fP\fINAME\fP...]
Allowed URL schemes as comma-separated list.
.SS \fB[filtering]\fP
.TP
\fBignore=\fP\fIREGEX\fP (MULTILINE)
Only check syntax of URLs matching the given regular expressions.
.br
Command line option: \fB\-\-ignore\-url\fP
.TP
\fBignorewarnings=\fP\fINAME\fP[\fB,\fP\fINAME\fP...]
Ignore the comma-separated list of warnings. See
\fBWARNINGS\fP for the list of supported warnings.
.br
Command line option: none
.TP
\fBinternlinks=\fP\fIREGEX\fP
Regular expression to add more URLs recognized as internal links.
Default is that URLs given on the command line are internal.
.br
Command line option: none
.TP
\fBnofollow=\fP\fIREGEX\fP (MULTILINE)
Check but do not recurse into URLs matching the given regular
expressions.
.br
Command line option: \fB\-\-no\-follow\-url\fP
.TP
\fBcheckextern=\fP[\fB0\fP|\fB1\fP]
Check external links. Default is to check internal links only.
.br
Command line option: \fB\-\-checkextern\fP
.SS \fB[authentication]\fP
.TP
\fBentry=\fP\fIREGEX\fP \fIUSER\fP [\fIPASS\fP] (MULTILINE)
Provide individual username/password pairs for different links. In addtion to a
single login page specified with \fBloginurl\fP multiple FTP,
HTTP (Basic Authentication) and telnet links are supported. Entries are a
triple (URL regex, username, password) or a tuple (URL regex, username),
where the entries are separated by whitespace.
.br
The password is optional and if missing it has to be entered at the
commandline.
.br
If the regular expression matches the checked URL, the given username/password
pair is used for authentication. The command line options
\fB\-u\fP and \fB\-p\fP match every link and therefore override the entries
given here. The first match wins.
.br
Command line option: \fB\-u\fP, \fB\-p\fP
.TP
\fBloginurl=\fP\fIURL\fP
The URL of a login page to be visited before link checking. The page is expected
to contain an HTML form to collect credentials and submit them to the address in
its action attribute using an HTTP POST request.
The name attributes of the input elements of the form and the values to be
submitted need to be available (see \fBentry\fP for an explanation of username
and password values).
.TP
\fBloginuserfield=\fP\fISTRING\fP
The name attribute of the username input element. Default: \fBlogin\fP.
.TP
\fBloginpasswordfield=\fP\fISTRING\fP
The name attribute of the password input element. Default: \fBpassword\fP.
.TP
\fBloginextrafields=\fP\fINAME\fP\fB:\fP\fIVALUE\fP (MULTILINE)
Optionally the name attributes of any additional input elements and the values
to populate them with. Note that these are submitted without
checking whether matching input elements exist in the HTML form.
.SS \fB[output]\fP
.TP
\fBdebug=\fP\fISTRING\fP[\fB,\fP\fISTRING\fP...]
Print debugging output for the given modules.
Available debug modules are \fBcmdline\fP, \fBchecking\fP,
\fBcache\fP, \fBdns\fP, \fBthread\fP, \fBplugins\fP and \fBall\fP.
Specifying \fBall\fP is an alias for specifying all available loggers.
.br
Command line option: \fB\-\-debug\fP
.TP
\fBfileoutput=\fP\fITYPE\fP[\fB,\fP\fITYPE\fP...]
Output to a files \fBlinkchecker\-out.\fP\fITYPE\fP,
\fB$HOME/.linkchecker/blacklist\fP for
\fBblacklist\fP output.
.br
Valid file output types are \fBtext\fP, \fBhtml\fP, \fBsql\fP,
\fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBnone\fP or \fBblacklist\fP
Default is no file output. The various output types are documented
below. Note that you can suppress all console output
with \fBoutput=none\fP.
.br
Command line option: \fB\-\-file\-output\fP
.TP
\fBlog=\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP]
Specify output type as \fBtext\fP, \fBhtml\fP, \fBsql\fP,
\fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBnone\fP or \fBblacklist\fP.
Default type is \fBtext\fP. The various output types are documented
below.
.br
The \fIENCODING\fP specifies the output encoding, the default is
that of your locale. Valid encodings are listed at
.UR https://docs.python.org/library/codecs.html#standard-encodings
.UE .
.br
Command line option: \fB\-\-output\fP
.TP
\fBquiet=\fP[\fB0\fP|\fB1\fP]
If set, operate quiet. An alias for \fBlog=none\fP.
This is only useful with \fBfileoutput\fP.
.br
Command line option: \fB\-\-verbose\fP
.TP
\fBstatus=\fP[\fB0\fP|\fB1\fP]
Control printing check status messages. Default is 1.
.br
Command line option: \fB\-\-no\-status\fP
.TP
\fBverbose=\fP[\fB0\fP|\fB1\fP]
If set log all checked URLs once. Default is to log only errors and warnings.
.br
Command line option: \fB\-\-verbose\fP
.TP
\fBwarnings=\fP[\fB0\fP|\fB1\fP]
If set log warnings. Default is to log warnings.
.br
Command line option: \fB\-\-no\-warnings\fP
.SS \fB[text]\fP
.TP
\fBfilename=\fP\fISTRING\fP
Specify output filename for text logging. Default filename is
\fBlinkchecker-out.txt\fP.
.br
Command line option: \fB\-\-file\-output=\fP
.TP
\fBparts=\fP\fISTRING\fP
Comma-separated list of parts that have to be logged.
See \fBLOGGER PARTS\fP below.
.br
Command line option: none
.TP
\fBencoding=\fP\fISTRING\fP
Valid encodings are listed in
.UR https://docs.python.org/library/codecs.html#standard-encodings
.UE .
.br
Default encoding is \fBiso\-8859\-15\fP.
.TP
\fIcolor*\fP
Color settings for the various log parts, syntax is \fIcolor\fP or
\fItype\fP\fB;\fP\fIcolor\fP. The \fItype\fP can be
\fBbold\fP, \fBlight\fP, \fBblink\fP, \fBinvert\fP.
The \fIcolor\fP can be
\fBdefault\fP, \fBblack\fP, \fBred\fP, \fBgreen\fP, \fByellow\fP, \fBblue\fP,
\fBpurple\fP, \fBcyan\fP, \fBwhite\fP, \fBBlack\fP, \fBRed\fP, \fBGreen\fP,
\fBYellow\fP, \fBBlue\fP, \fBPurple\fP, \fBCyan\fP or \fBWhite\fP.
.br
Command line option: none
.TP
\fBcolorparent=\fP\fISTRING\fP
Set parent color. Default is \fBwhite\fP.
.TP
\fBcolorurl=\fP\fISTRING\fP
Set URL color. Default is \fBdefault\fP.
.TP
\fBcolorname=\fP\fISTRING\fP
Set name color. Default is \fBdefault\fP.
.TP
\fBcolorreal=\fP\fISTRING\fP
Set real URL color. Default is \fBcyan\fP.
.TP
\fBcolorbase=\fP\fISTRING\fP
Set base URL color. Default is \fBpurple\fP.
.TP
\fBcolorvalid=\fP\fISTRING\fP
Set valid color. Default is \fBbold;green\fP.
.TP
\fBcolorinvalid=\fP\fISTRING\fP
Set invalid color. Default is \fBbold;red\fP.
.TP
\fBcolorinfo=\fP\fISTRING\fP
Set info color. Default is \fBdefault\fP.
.TP
\fBcolorwarning=\fP\fISTRING\fP
Set warning color. Default is \fBbold;yellow\fP.
.TP
\fBcolordltime=\fP\fISTRING\fP
Set download time color. Default is \fBdefault\fP.
.TP
\fBcolorreset=\fP\fISTRING\fP
Set reset color. Default is \fBdefault\fP.
.SS \fB[gml]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBparts=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.SS \fB[dot]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBparts=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.SS \fB[csv]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBparts=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.TP
\fBseparator=\fP\fICHAR\fP
Set CSV separator. Default is a comma (\fB,\fP).
.TP
\fBquotechar=\fP\fICHAR\fP
Set CSV quote character. Default is a double quote (\fB"\fP).
.SS \fB[sql]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBparts=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.TP
\fBdbname=\fP\fISTRING\fP
Set database name to store into. Default is \fBlinksdb\fP.
.TP
\fBseparator=\fP\fICHAR\fP
Set SQL command separator character. Default is a semicolon (\fB;\fP).
.SS \fB[html]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBparts=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.TP
\fBcolorbackground=\fP\fICOLOR\fP
Set HTML background color. Default is \fB#fff7e5\fP.
.TP
\fBcolorurl=\fP
Set HTML URL color. Default is \fB#dcd5cf\fP.
.TP
\fBcolorborder=\fP
Set HTML border color. Default is \fB#000000\fP.
.TP
\fBcolorlink=\fP
Set HTML link color. Default is \fB#191c83\fP.
.TP
\fBcolorwarning=\fP
Set HTML warning color. Default is \fB#e0954e\fP.
.TP
\fBcolorerror=\fP
Set HTML error color. Default is \fB#db4930\fP.
.TP
\fBcolorok=\fP
Set HTML valid color. Default is \fB#3ba557\fP.
.SS \fB[blacklist]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.SS \fB[xml]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBparts=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.SS \fB[gxml]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBparts=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.SS \fB[sitemap]\fP
.TP
\fBfilename=\fP\fISTRING\fP
See [text] section above.
.TP
\fBparts=\fP\fISTRING\fP
See [text] section above.
.TP
\fBencoding=\fP\fISTRING\fP
See [text] section above.
.TP
\fBpriority=\fP\fIFLOAT\fP
A number between 0.0 and 1.0 determining the priority. The default
priority for the first URL is 1.0, for all child URLs 0.5.
.TP
\fBfrequency=\fP[\fBalways\fP|\fBhourly\fP|\fBdaily\fP|\fBweekly\fP|\fBmonthly\fP|\fByearly\fP|\fBnever\fP]
How frequently pages are changing.
.
.SH "LOGGER PARTS"
.TS
nokeep, tab(@);
ll.
\fBall\fP@(for all parts)
\fBid\fP@(a unique ID for each logentry)
\fBrealurl\fP@(the full url link)
\fBresult\fP@(valid or invalid, with messages)
\fBextern\fP@(1 or 0, only in some logger types reported)
\fBbase\fP@(base href=...)
\fBname\fP@(<a href=...>name</a> and <img alt="name">)
\fBparenturl\fP@(if any)
\fBinfo\fP@(some additional info, e.g. FTP welcome messages)
\fBwarning\fP@(warnings)
\fBdltime\fP@(download time)
\fBchecktime\fP@(check time)
\fBurl\fP@(the original url name, can be relative)
\fBintro\fP@(the blurb at the beginning, "starting at ...")
\fBoutro\fP@(the blurb at the end, "found x errors ...")
.TE
.SH MULTILINE
Some option values can span multiple lines. Each line has to be indented
for that to work. Lines starting with a hash (\fB#\fP) will be ignored,
though they must still be indented.
.EX
ignore=
lconline
bookmark
# a comment
^mailto:
.EE
.SH EXAMPLE
.EX
[output]
log=html
.PP
[checking]
threads=5
.PP
[filtering]
ignorewarnings=http-moved-permanent
.EE
.SH PLUGINS
All plugins have a separate section. If the section
appears in the configuration file the plugin is enabled.
Some plugins read extra options in their section.
.SS \fB[AnchorCheck]\fP
Checks validity of HTML anchors.
.SS \fB[LocationInfo]\fP
Adds the country and if possible city name of the URL host as info.
Needs GeoIP or pygeoip and a local country or city lookup DB installed.
.SS \fB[RegexCheck]\fP
Define a regular expression which prints a warning if it matches
any content of the checked link. This applies only to valid pages,
so we can get their content.
.TP
\fBwarningregex=\fP\fIREGEX\fP
Use this to check for pages that contain some form of error
message, for example "This page has moved" or "Oracle
Application error". \fIREGEX\fP should be unquoted.
Note that multiple values can be combined in the regular expression,
for example "(This page has moved|Oracle Application error)".
.SS \fB[SslCertificateCheck]\fP
Check SSL certificate expiration date. Only internal https: links
will be checked. A domain will only be checked once to avoid duplicate
warnings.
.TP
\fBsslcertwarndays=\fP\fINUMBER\fP
Configures the expiration warning time in days.
.SS \fB[HtmlSyntaxCheck]\fP
Check the syntax of HTML pages with the online W3C HTML validator.
See
.UR https://validator.w3.org/docs/api.html
.UE .
.SS \fB[HttpHeaderInfo]\fP
Print HTTP headers in URL info.
.TP
\fBprefixes=\fP\fIprefix1\fP[,\fIprefix2\fP]...
List of comma separated header prefixes. For example
to display all HTTP headers that start with "X-".
.SS \fB[CssSyntaxCheck]\fP
Check the syntax of HTML pages with the online W3C CSS validator.
See
.UR https://jigsaw.w3.org/css-validator/manual.html#expert
.UE .
.SS \fB[VirusCheck]\fP
Checks the page content for virus infections with clamav.
A local clamav daemon must be installed.
.TP
\fBclamavconf=\fP\fIfilename\fP
Filename of \fBclamd.conf\fP config file.
.
.SS \fB[PdfParser]\fP
Parse PDF files for URLs to check. Needs the \fBpdfminer\fP
Python package installed.
.SS \fB[WordParser]\fP
Parse Word files for URLs to check. Needs the \fPpywin32\fP
Python extension installed.
.SH WARNINGS
The following warnings are recognized in the 'ignorewarnings' config
file entry:
.br
.TP
\fBfile-missing-slash\fP
The file: URL is missing a trailing slash.
.TP
\fBfile-system-path\fP
The file: path is not the same as the system specific path.
.TP
\fBftp-missing-slash\fP
The ftp: URL is missing a trailing slash.
.TP
\fBhttp-cookie-store-error\fP
An error occurred while storing a cookie.
.TP
\fBhttp-empty-content\fP
The URL had no content.
.TP
\fBmail-no-mx-host\fP
The mail MX host could not be found.
.TP
\fBnntp-no-newsgroup\fP
The NNTP newsgroup could not be found.
.TP
\fBnntp-no-server\fP
No NNTP server was found.
.TP
\fBurl-content-size-zero\fP
The URL content size is zero.
.TP
\fBurl-content-too-large\fP
The URL content size is too large.
.TP
\fBurl-effective-url\fP
The effective URL is different from the original.
.TP
\fBurl-error-getting-content\fP
Could not get the content of the URL.
.TP
\fBurl-obfuscated-ip\fP
The IP is obfuscated.
.TP
\fBurl-whitespace\fP
The URL contains leading or trailing whitespace.
.SH "SEE ALSO"
.BR linkchecker (1)
.
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>
.
.SH COPYRIGHT
Copyright \(co 2000-2014 Bastian Kleineidam

1229
doc/fr.po

File diff suppressed because it is too large Load diff

7035
doc/i18n/gettext/code.pot Normal file

File diff suppressed because it is too large Load diff

130
doc/i18n/gettext/faq.pot Normal file
View file

@ -0,0 +1,130 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2000-2014 Bastian Kleineidam
# This file is distributed under the same license as the LinkChecker package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: LinkChecker \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2020-08-05 19:32+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: ../../src/faq.rst:4
msgid "Frequently Asked Questions"
msgstr ""
#: ../../src/faq.rst:6
msgid "**Q: LinkChecker produced an error, but my web page is okay with Mozilla/IE/Opera/... Is this a bug in LinkChecker?**"
msgstr ""
#: ../../src/faq.rst:9
msgid "A: Please check your web pages first. Are they really okay? Often the major browsers are very forgiving and good at handling HTML of HTTP errors, while LinkChecker complains in most cases of invalid content."
msgstr ""
#: ../../src/faq.rst:14
msgid "Enable the :ref:`man/linkcheckerrc:HtmlSyntaxCheck` plugin, or check if you are using a proxy which produces the error."
msgstr ""
#: ../../src/faq.rst:18
msgid "**Q: I still get an error, but the page is definitely okay.**"
msgstr ""
#: ../../src/faq.rst:20
msgid "A: Some servers deny access of automated tools (also called robots) like LinkChecker. This is not a bug in LinkChecker but rather a policy by the webmaster running the website you are checking. Look in the ``/robots.txt`` file which follows the `robots.txt exclusion standard <http://www.robotstxt.org/robotstxt.html>`_."
msgstr ""
#: ../../src/faq.rst:26
msgid "For identification LinkChecker adds to each request a User-Agent header like this::"
msgstr ""
#: ../../src/faq.rst:31
#: ../../src/faq.rst:91
msgid "If you yourself are the webmaster, consider allowing LinkChecker to check your web pages by adding the following to your robots.txt file::"
msgstr ""
#: ../../src/faq.rst:38
msgid "**Q: How can I tell LinkChecker which proxy to use?**"
msgstr ""
#: ../../src/faq.rst:40
msgid "A: LinkChecker works automatically with proxies. In a Unix or Windows environment, set the http_proxy, https_proxy, ftp_proxy environment variables to a URL that identifies the proxy server before starting LinkChecker. For example:"
msgstr ""
#: ../../src/faq.rst:51
msgid "**Q: The link \"mailto:john@company.com?subject=Hello John\" is reported as an error.**"
msgstr ""
#: ../../src/faq.rst:54
msgid "A: You have to quote special characters (e.g. spaces) in the subject field. The correct link should be \"mailto:...?subject=Hello%20John\" Unfortunately browsers like IE and Netscape do not enforce this."
msgstr ""
#: ../../src/faq.rst:59
msgid "**Q: Has LinkChecker JavaScript support?**"
msgstr ""
#: ../../src/faq.rst:61
msgid "A: No, it never will. If your page is only working with JS, it is better to use a browser testing tool like `Selenium <http://seleniumhq.org/>`_."
msgstr ""
#: ../../src/faq.rst:65
msgid "**Q: Is the LinkCheckers cookie feature insecure?**"
msgstr ""
#: ../../src/faq.rst:67
msgid "A: Potentially yes. This depends on what information you specify in the cookie file. The cookie information will be sent to the specified hosts."
msgstr ""
#: ../../src/faq.rst:71
msgid "Also, the following restrictions apply for cookies that LinkChecker receives from the hosts it check:"
msgstr ""
#: ../../src/faq.rst:74
msgid "Cookies will only be sent back to the originating server (i.e. no third party cookies are allowed)."
msgstr ""
#: ../../src/faq.rst:76
msgid "Cookies are only stored in memory. After LinkChecker finishes, they are lost."
msgstr ""
#: ../../src/faq.rst:78
msgid "The cookie feature is disabled as default."
msgstr ""
#: ../../src/faq.rst:81
msgid "**Q: LinkChecker retrieves a /robots.txt file for every site it checks. What is that about?**"
msgstr ""
#: ../../src/faq.rst:84
msgid "A: LinkChecker follows the `robots.txt exclusion standard <http://www.robotstxt.org/robotstxt.html>`_. To avoid misuse of LinkChecker, you cannot turn this feature off. See the `Web Robot pages <http://www.robotstxt.org/robotstxt.html>`_ and the `Spidering report <http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt>`_ for more info."
msgstr ""
#: ../../src/faq.rst:98
msgid "**Q: How do I print unreachable/dead documents of my website with LinkChecker?**"
msgstr ""
#: ../../src/faq.rst:101
msgid "A: No can do. This would require file system access to your web repository and access to your web server configuration."
msgstr ""
#: ../../src/faq.rst:105
msgid "**Q: How do I check HTML/XML/CSS syntax with LinkChecker?**"
msgstr ""
#: ../../src/faq.rst:107
msgid "A: Enable the :ref:`man/linkcheckerrc:HtmlSyntaxCheck` and :ref:`man/linkcheckerrc:CssSyntaxCheck` plugins."
msgstr ""
#: ../../src/faq.rst:111
msgid "**Q: I want to have my own logging class. How can I use it in LinkChecker?**"
msgstr ""
#: ../../src/faq.rst:113
msgid "A: A Python API lets you define new logging classes. Define your own logging class as a subclass of *_Logger* or any other logging class in the *log* module. Then call the *add_logger* function in *Config.Configuration* to register your new Logger. After this append a new Logging instance to the fileoutput."
msgstr ""

121
doc/i18n/gettext/index.pot Normal file
View file

@ -0,0 +1,121 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2000-2014 Bastian Kleineidam
# This file is distributed under the same license as the LinkChecker package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: LinkChecker \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2020-08-05 19:32+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
#: ../../src/index.rst:6
msgid "Check websites for broken links"
msgstr ""
#: ../../src/index.rst:9
msgid "Introduction"
msgstr ""
#: ../../src/index.rst:10
msgid "LinkChecker is a free, `GPL <http://www.gnu.org/licenses/gpl-2.0.html>`_ licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.5 or later."
msgstr ""
#: ../../src/index.rst:15
msgid "Visit the project on `GitHub <https://github.com/linkchecker/linkchecker>`_."
msgstr ""
#: ../../src/index.rst:18
msgid "Installation"
msgstr ""
#: ../../src/index.rst:25
msgid "Basic usage"
msgstr ""
#: ../../src/index.rst:26
msgid "To check a URL like *http://www.example.org/myhomepage/* it is enough to execute:"
msgstr ""
#: ../../src/index.rst:33
msgid "This check will validate recursively all pages starting with *http://www.example.org/myhomepage/*. Additionally, all external links pointing outside of *www.example.org* will be checked but not recursed into."
msgstr ""
#: ../../src/index.rst:39
msgid "Features"
msgstr ""
#: ../../src/index.rst:41
msgid "recursive and multithreaded checking and site crawling"
msgstr ""
#: ../../src/index.rst:42
msgid "output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats"
msgstr ""
#: ../../src/index.rst:44
msgid "HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support"
msgstr ""
#: ../../src/index.rst:46
msgid "restriction of link checking with regular expression filters for URLs"
msgstr ""
#: ../../src/index.rst:47
msgid "proxy support"
msgstr ""
#: ../../src/index.rst:48
msgid "username/password authorization for HTTP and FTP and Telnet"
msgstr ""
#: ../../src/index.rst:49
msgid "honors robots.txt exclusion protocol"
msgstr ""
#: ../../src/index.rst:50
msgid "Cookie support"
msgstr ""
#: ../../src/index.rst:51
msgid "HTML5 support"
msgstr ""
#: ../../src/index.rst:52
msgid ":ref:`Plugin support <man/linkchecker:PLUGINS>` allowing custom page checks. Currently available are HTML and CSS syntax checks, Antivirus checks, and more."
msgstr ""
#: ../../src/index.rst:54
msgid "Different interfaces: command line and web interface"
msgstr ""
#: ../../src/index.rst:55
msgid "... and a lot more check options documented in the :doc:`man/linkchecker` manual page."
msgstr ""
#: ../../src/index.rst:59
msgid "Screenshots"
msgstr ""
#: ../../src/index.rst:69
msgid "Commandline interface"
msgstr ""
#: ../../src/index.rst:70
msgid "WSGI web interface"
msgstr ""
#: ../../src/index.rst:73
msgid "Test suite status"
msgstr ""
#: ../../src/index.rst:74
msgid "Linkchecker has extensive unit tests to ensure code quality. `Travis CI <https://travis-ci.com/>`_ is used for continuous build and test integration."
msgstr ""

1672
doc/i18n/gettext/man.pot Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,188 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2000-2014 Bastian Kleineidam
# This file is distributed under the same license as the LinkChecker
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2020.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: LinkChecker \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2020-08-05 19:32+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.8.0\n"
#: ../../src/faq.rst:4
msgid "Frequently Asked Questions"
msgstr ""
#: ../../src/faq.rst:6
msgid ""
"**Q: LinkChecker produced an error, but my web page is okay with "
"Mozilla/IE/Opera/... Is this a bug in LinkChecker?**"
msgstr ""
#: ../../src/faq.rst:9
msgid ""
"A: Please check your web pages first. Are they really okay? Often the "
"major browsers are very forgiving and good at handling HTML of HTTP "
"errors, while LinkChecker complains in most cases of invalid content."
msgstr ""
#: ../../src/faq.rst:14
msgid ""
"Enable the :ref:`man/linkcheckerrc:HtmlSyntaxCheck` plugin, or check if "
"you are using a proxy which produces the error."
msgstr ""
#: ../../src/faq.rst:18
msgid "**Q: I still get an error, but the page is definitely okay.**"
msgstr ""
#: ../../src/faq.rst:20
msgid ""
"A: Some servers deny access of automated tools (also called robots) like "
"LinkChecker. This is not a bug in LinkChecker but rather a policy by the "
"webmaster running the website you are checking. Look in the "
"``/robots.txt`` file which follows the `robots.txt exclusion standard "
"<http://www.robotstxt.org/robotstxt.html>`_."
msgstr ""
#: ../../src/faq.rst:26
msgid ""
"For identification LinkChecker adds to each request a User-Agent header "
"like this::"
msgstr ""
#: ../../src/faq.rst:31 ../../src/faq.rst:91
msgid ""
"If you yourself are the webmaster, consider allowing LinkChecker to check"
" your web pages by adding the following to your robots.txt file::"
msgstr ""
#: ../../src/faq.rst:38
msgid "**Q: How can I tell LinkChecker which proxy to use?**"
msgstr ""
#: ../../src/faq.rst:40
msgid ""
"A: LinkChecker works automatically with proxies. In a Unix or Windows "
"environment, set the http_proxy, https_proxy, ftp_proxy environment "
"variables to a URL that identifies the proxy server before starting "
"LinkChecker. For example:"
msgstr ""
#: ../../src/faq.rst:51
msgid ""
"**Q: The link \"mailto:john@company.com?subject=Hello John\" is reported "
"as an error.**"
msgstr ""
#: ../../src/faq.rst:54
msgid ""
"A: You have to quote special characters (e.g. spaces) in the subject "
"field. The correct link should be \"mailto:...?subject=Hello%20John\" "
"Unfortunately browsers like IE and Netscape do not enforce this."
msgstr ""
#: ../../src/faq.rst:59
msgid "**Q: Has LinkChecker JavaScript support?**"
msgstr ""
#: ../../src/faq.rst:61
msgid ""
"A: No, it never will. If your page is only working with JS, it is better "
"to use a browser testing tool like `Selenium <http://seleniumhq.org/>`_."
msgstr ""
#: ../../src/faq.rst:65
msgid "**Q: Is the LinkCheckers cookie feature insecure?**"
msgstr ""
#: ../../src/faq.rst:67
msgid ""
"A: Potentially yes. This depends on what information you specify in the "
"cookie file. The cookie information will be sent to the specified hosts."
msgstr ""
#: ../../src/faq.rst:71
msgid ""
"Also, the following restrictions apply for cookies that LinkChecker "
"receives from the hosts it check:"
msgstr ""
#: ../../src/faq.rst:74
msgid ""
"Cookies will only be sent back to the originating server (i.e. no third "
"party cookies are allowed)."
msgstr ""
#: ../../src/faq.rst:76
msgid ""
"Cookies are only stored in memory. After LinkChecker finishes, they are "
"lost."
msgstr ""
#: ../../src/faq.rst:78
msgid "The cookie feature is disabled as default."
msgstr ""
#: ../../src/faq.rst:81
msgid ""
"**Q: LinkChecker retrieves a /robots.txt file for every site it checks. "
"What is that about?**"
msgstr ""
#: ../../src/faq.rst:84
msgid ""
"A: LinkChecker follows the `robots.txt exclusion standard "
"<http://www.robotstxt.org/robotstxt.html>`_. To avoid misuse of "
"LinkChecker, you cannot turn this feature off. See the `Web Robot pages "
"<http://www.robotstxt.org/robotstxt.html>`_ and the `Spidering report "
"<http://www.w3.org/Search/9605-Indexing-"
"Workshop/ReportOutcomes/Spidering.txt>`_ for more info."
msgstr ""
#: ../../src/faq.rst:98
msgid ""
"**Q: How do I print unreachable/dead documents of my website with "
"LinkChecker?**"
msgstr ""
#: ../../src/faq.rst:101
msgid ""
"A: No can do. This would require file system access to your web "
"repository and access to your web server configuration."
msgstr ""
#: ../../src/faq.rst:105
msgid "**Q: How do I check HTML/XML/CSS syntax with LinkChecker?**"
msgstr ""
#: ../../src/faq.rst:107
msgid ""
"A: Enable the :ref:`man/linkcheckerrc:HtmlSyntaxCheck` and "
":ref:`man/linkcheckerrc:CssSyntaxCheck` plugins."
msgstr ""
#: ../../src/faq.rst:111
msgid ""
"**Q: I want to have my own logging class. How can I use it in "
"LinkChecker?**"
msgstr ""
#: ../../src/faq.rst:113
msgid ""
"A: A Python API lets you define new logging classes. Define your own "
"logging class as a subclass of *_Logger* or any other logging class in "
"the *log* module. Then call the *add_logger* function in "
"*Config.Configuration* to register your new Logger. After this append a "
"new Logging instance to the fileoutput."
msgstr ""

View file

@ -0,0 +1,155 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2000-2014 Bastian Kleineidam
# This file is distributed under the same license as the LinkChecker
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2020.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: LinkChecker \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2020-08-05 19:32+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.8.0\n"
#: ../../src/index.rst:6
msgid "Check websites for broken links"
msgstr ""
#: ../../src/index.rst:9
msgid "Introduction"
msgstr ""
#: ../../src/index.rst:10
msgid ""
"LinkChecker is a free, `GPL <http://www.gnu.org/licenses/gpl-2.0.html>`_ "
"licensed website validator. LinkChecker checks links in web documents or "
"full websites. It runs on Python 3 systems, requiring Python 3.5 or "
"later."
msgstr ""
#: ../../src/index.rst:15
msgid ""
"Visit the project on `GitHub "
"<https://github.com/linkchecker/linkchecker>`_."
msgstr ""
#: ../../src/index.rst:18
msgid "Installation"
msgstr ""
#: ../../src/index.rst:25
msgid "Basic usage"
msgstr ""
#: ../../src/index.rst:26
msgid ""
"To check a URL like *http://www.example.org/myhomepage/* it is enough to "
"execute:"
msgstr ""
#: ../../src/index.rst:33
msgid ""
"This check will validate recursively all pages starting with "
"*http://www.example.org/myhomepage/*. Additionally, all external links "
"pointing outside of *www.example.org* will be checked but not recursed "
"into."
msgstr ""
#: ../../src/index.rst:39
msgid "Features"
msgstr ""
#: ../../src/index.rst:41
msgid "recursive and multithreaded checking and site crawling"
msgstr ""
#: ../../src/index.rst:42
msgid ""
"output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph "
"in different formats"
msgstr ""
#: ../../src/index.rst:44
msgid ""
"HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links "
"support"
msgstr ""
#: ../../src/index.rst:46
msgid "restriction of link checking with regular expression filters for URLs"
msgstr ""
#: ../../src/index.rst:47
msgid "proxy support"
msgstr ""
#: ../../src/index.rst:48
msgid "username/password authorization for HTTP and FTP and Telnet"
msgstr ""
#: ../../src/index.rst:49
msgid "honors robots.txt exclusion protocol"
msgstr ""
#: ../../src/index.rst:50
msgid "Cookie support"
msgstr ""
#: ../../src/index.rst:51
msgid "HTML5 support"
msgstr ""
#: ../../src/index.rst:52
msgid ""
":ref:`Plugin support <man/linkchecker:PLUGINS>` allowing custom page "
"checks. Currently available are HTML and CSS syntax checks, Antivirus "
"checks, and more."
msgstr ""
#: ../../src/index.rst:54
msgid "Different interfaces: command line and web interface"
msgstr ""
#: ../../src/index.rst:55
msgid ""
"... and a lot more check options documented in the :doc:`man/linkchecker`"
" manual page."
msgstr ""
#: ../../src/index.rst:59
msgid "Screenshots"
msgstr ""
#: ../../src/index.rst:69
msgid "Commandline interface"
msgstr ""
#: ../../src/index.rst:70
msgid "WSGI web interface"
msgstr ""
#: ../../src/index.rst:73
msgid "Test suite status"
msgstr ""
#: ../../src/index.rst:74
msgid ""
"Linkchecker has extensive unit tests to ensure code quality. `Travis CI "
"<https://travis-ci.com/>`_ is used for continuous build and test "
"integration."
msgstr ""
#~ msgid ""
#~ "#.. image:: https://travis-"
#~ "ci.com/linkchecker/linkchecker.png # :alt: Build"
#~ " Status # :target: https://travis-"
#~ "ci.com/linkchecker/linkchecker"
#~ msgstr ""

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -55,8 +55,8 @@ First, install the required software.
ClamAv from https://www.clamav.net/
7. *Optional, for GNOME proxy setting parsing:*
PyGObject and GIO.
Best installed from your distribution e.g. ``python3-gi``
PyGObject and GIO.
Best installed from your distribution e.g. ``python3-gi``
8. *Optional, to run the WSGI web interface:*
Apache from https://httpd.apache.org/

File diff suppressed because it is too large Load diff

568
doc/man/de/linkchecker.1 Normal file
View file

@ -0,0 +1,568 @@
.\" Man page generated from reStructuredText.
.
.TH "LINKCHECKER" "1" "August 11, 2020" "" "LinkChecker"
.SH NAME
linkchecker \- Kommandozeilenprogramm zum Prüfen von HTML Dokumenten und Webseiten auf ungültige Verknüpfungen
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNTAX
.sp
\fBlinkchecker\fP [\fIOptionen\fP] [\fIDatei\-oder\-URL\fP]...
.SH BESCHREIBUNG
.sp
LinkChecker beinhaltet
.INDENT 0.0
.IP \(bu 2
rekursives Prüfen und Multithreading
.IP \(bu 2
Ausgabe als farbigen oder normalen Text, HTML, SQL, CSV, XML oder einen Sitemap\-Graphen in verschiedenen Formaten
.IP \(bu 2
Unterstützung von HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet und Verknüpfungen auf lokale Dateien
.IP \(bu 2
Einschränkung der Linküberprüfung mit URL\-Filter
.IP \(bu 2
Proxy\-Unterstützung
.IP \(bu 2
Benutzer/Passwort Authorisierung für HTTP, FTP und Telnet
.IP \(bu 2
Unterstützung des robots.txt Protokolls
.IP \(bu 2
Unterstützung für Cookies
.IP \(bu 2
Unterstützung für HTML5
.IP \(bu 2
HTML\- und CSS\-Syntaxprüfung
.IP \(bu 2
Antivirusprüfung
.IP \(bu 2
ein Kommandozeilenprogramm und web interface
.UNINDENT
.SH BEISPIELE
.sp
Der häufigste Gebrauchsfall prüft die angegebene Domäne rekursiv:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker http://www.example.com/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Beachten Sie dass dies die komplette Domäne überprüft, welche aus mehreren tausend URLs bestehen kann. Benutzen Sie die Option \fI\%\-r\fP, um die Rekursionstiefe zu beschränken.
.sp
Prüfe keine \fB/secret\fP URLs. Alle anderen Verknüpfungen werden wie üblich geprüft:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker \-\-ignore\-url=/secret mysite.example.com
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Überprüfung einer lokalen HTML Datei unter Unix:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker ../bla.html
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Überprüfung einer lokalen HTML Datei unter Windows:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
C:\e> linkchecker c:empest.html
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Sie können den \fBhttp://\fP URL Anteil weglassen wenn die Domäne mit \fBwww.\fP beginnt:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker www.example.com
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Sie können den \fBftp://\fP URL Anteil weglassen wenn die Domäne mit \fBftp.\fP beginnt:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker \-r0 ftp.example.com
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Erzeuge einen Sitemap Graphen und konvertiere ihn mit dem graphviz dot Programm:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker \-odot \-v www.example.com | dot \-Tps > sitemap.ps
.ft P
.fi
.UNINDENT
.UNINDENT
.SH OPTIONEN
.SS Allgemeine Optionen
.INDENT 0.0
.TP
.B \-f FILENAME, \-\-config=FILENAME
Benutze DATEINAME als Konfigurationsdatei. Standardmäßig benutzt LinkChecker ~/.linkchecker/linkcheckerrc.
.UNINDENT
.INDENT 0.0
.TP
.B \-h, \-\-help
Hilfe! Gebe Gebrauchsanweisung für dieses Programm aus.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-stdin
Lese Liste von URLs zum Prüfen von der Standardeingabe, getrennt durch Leerzeichen.
.UNINDENT
.INDENT 0.0
.TP
.B \-t NUMBER, \-\-threads=NUMBER
Generiere nicht mehr als die angegebene Anzahl von Threads. Die Standardanzahl von Threads ist 10. Um Threads zu deaktivieren, geben Sie eine nicht positive Nummer an.
.UNINDENT
.INDENT 0.0
.TP
.B \-V, \-\-version
Gebe die Version aus und beende das Programm.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-list\-plugins
Print available check plugins and exit.
.UNINDENT
.SS Ausgabeoptionen
.INDENT 0.0
.TP
.B \-D STRING, \-\-debug=STRING
Gebe Testmeldungen aus für den angegebenen Logger. Verfügbare Logger sind cmdline, checking, cache, dns, plugin und all. Die Angabe all ist ein Synonym für alle verfügbaren Logger. Diese Option kann mehrmals angegeben werden, um mit mehr als einem Logger zu testen. Um akkurate Ergebnisse zu erzielen, werden Threads deaktiviert.
.UNINDENT
.INDENT 0.0
.TP
.B \-F TYPE[/ENCODING][/FILENAME], \-\-file\-output=TYPE[/ENCODING][/FILENAME]
Ausgabe in eine Datei namens linkchecker\-out.TYP, $HOME/.linkchecker/blacklist bei blacklist Ausgabe, oder DATEINAME falls angegeben. Das ENCODING gibt die Ausgabekodierung an. Der Standard ist das der lokalen Spracheinstellung. Gültige Enkodierungen sind aufgelistet unter \fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP\&. Der DATEINAME und ENKODIERUNG Teil wird beim Ausgabetyp none ignoriert, ansonsten wird die Datei überschrieben falls sie existiert. Sie können diese Option mehr als einmal verwenden. Gültige Ausgabetypen sind text, html, sql, csv, gml, dot, xml, sitemap, none oder blacklist. Standard ist keine Dateiausgabe. Die unterschiedlichen Ausgabetypen sind weiter unten dokumentiert. Beachten Sie, dass Sie mit der Option \fI\%\-o\fP \fInone\fP jegliche Ausgaben auf der Konsole verhindern können.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-no\-status
Gebe keine Statusmeldungen aus.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-no\-warnings
Gebe keine Warnungen aus. Standard ist die Ausgabe von Warnungen.
.UNINDENT
.INDENT 0.0
.TP
.B \-o TYPE[/ENCODING], \-\-output=TYPE[/ENCODING]
Gib Ausgabetyp als text, html, sql, csv, gml, dot, xml, sitemap, none oder blacklist an. Stadard Typ ist text. Die verschiedenen Ausgabetypen sind unten dokumentiert. Das ENCODING gibt die Ausgabekodierung an. Der Standard ist das der lokalen Spracheinstellung. Gültige Enkodierungen sind aufgelistet unter \fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-q, \-\-quiet
Keine Ausgabe, ein Alias für \fI\%\-o\fP \fInone\fP\&. Dies ist nur in Verbindung mit \fI\%\-F\fP nützlich.
.UNINDENT
.INDENT 0.0
.TP
.B \-v, \-\-verbose
Gebe alle geprüften URLs aus. Standard ist es, nur fehlerhafte URLs und Warnungen auszugeben.
.UNINDENT
.INDENT 0.0
.TP
.B \-W REGEX, \-\-warning\-regex=REGEX
Definieren Sie einen regulären Ausdruck der eine Warnung ausgibt falls er auf den Inhalt einer geprüften URL zutrifft. Dies gilt nur für gültige Seiten deren Inhalt wir bekommen können. Benutzen Sie dies, um nach Seiten zu suchen, welche bestimmte Fehler enthalten, zum Beispiel "Diese Seite ist umgezogen" oder "Oracle "Applikationsfehler". Man beachte, dass mehrere Werte in dem regulären Ausdruck kombiniert werden können, zum Beispiel "(Diese Seite ist umgezogen|Oracle Applikationsfehler)". Siehe Abschnitt \fI\%REGULAR EXPRESSIONS\fP für weitere Infos.
.UNINDENT
.SS Optionen zum Prüfen
.INDENT 0.0
.TP
.B \-\-cookiefile=FILENAME
Lese eine Datei mit Cookie\-Daten. Das Cookie Datenformat wird weiter unten erklärt.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-check\-extern
Check external URLs.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-ignore\-url=REGEX
URLs matching the given regular expression will only be syntax checked.
This option can be given multiple times.
See section \fI\%REGULAR EXPRESSIONS\fP for more info.
.UNINDENT
.INDENT 0.0
.TP
.B \-N STRING, \-\-nntp\-server=STRING
Gibt ein NNTP Rechner für news: Links. Standard ist die Umgebungsvariable \fI\%NNTP_SERVER\fP\&. Falls kein Rechner angegeben ist, wird lediglich auf korrekte Syntax des Links geprüft.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-no\-follow\-url=REGEX
Prüfe URLs die auf den regulären Ausdruck zutreffen, aber führe keine Rekursion durch. Diese Option kann mehrmals angegeben werden. Siehe Abschnitt \fI\%REGULAR EXPRESSIONS\fP für weitere Infos.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-no\-robots
Check URLs regardless of any robots.txt files.
.UNINDENT
.INDENT 0.0
.TP
.B \-p, \-\-password
Liest ein Passwort von der Kommandozeile und verwende es für HTTP und FTP Autorisierung. Für FTP ist das Standardpasswort anonymous@. Für HTTP gibt es kein Standardpasswort. Siehe auch \fI\%\-u\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-r NUMBER, \-\-recursion\-level=NUMBER
Prüfe rekursiv alle URLs bis zu der angegebenen Tiefe. Eine negative Tiefe bewirkt unendliche Rekursion. Standard Tiefe ist unendlich.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-timeout=NUMBER
Setze den Timeout für TCP\-Verbindungen in Sekunden. Der Standard Timeout ist 60 Sekunden.
.UNINDENT
.INDENT 0.0
.TP
.B \-u STRING, \-\-user=STRING
Verwende den angegebenen Benutzernamen für HTTP und FTP Autorisierung. Für FTP ist der Standardname anonymous. Für HTTP gibt es keinen Standardnamen. Siehe auch \fI\%\-p\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-user\-agent=STRING
Gibt den User\-Agent an, der zu HTTP\-Servern geschickt wird, z.B. "Mozilla/4.0". Der Standard ist "LinkChecker/X.Y", wobei X.Y die aktuelle Version von LinkChecker ist.
.UNINDENT
.SH KONFIGURATIONSDATEIEN
.sp
Konfigurationsdateien können alle obigen Optionen enthalten. Sie können zudem Optionen enthalten, welche nicht auf der Kommandozeile gesetzt werden können. Siehe \fBlinkcheckerrc(5)\fP für mehr Informationen.
.SH AUSGABETYPEN
.sp
Beachten Sie, dass standardmäßig nur Fehler und Warnungen protokolliert werden. Sie sollten die \fI\%\-\-verbose\fP Option benutzen, um eine komplette URL Liste zu erhalten, besonders bei Ausgabe eines Sitemap\-Graphen.
.INDENT 0.0
.TP
\fBtext\fP
Standard Textausgabe in "Schlüssel: Wert"\-Form.
.TP
\fBhtml\fP
Gebe URLs in "Schlüssel: Wert"\-Form als HTML formatiert aus. Besitzt zudem Verknüpfungen auf die referenzierten Seiten. Ungültige URLs haben Verknüpfungen zur HTML und CSS Syntaxprüfung angehängt.
.TP
\fBcsv\fP
Gebe Prüfresultat in CSV\-Format aus mit einer URL pro Zeile.
.TP
\fBgml\fP
Gebe Vater\-Kind Beziehungen zwischen verknüpften URLs als GML Graphen aus.
.TP
\fBdot\fP
Gebe Vater\-Kind Beziehungen zwischen verknüpften URLs als DOT Graphen aus.
.TP
\fBgxml\fP
Gebe Prüfresultat als GraphXML\-Datei aus.
.TP
\fBxml\fP
Gebe Prüfresultat als maschinenlesbare XML\-Datei aus.
.TP
\fBsitemap\fP
Protokolliere Prüfergebnisse als XML Sitemap dessen Format unter \fI\%https://www.sitemaps.org/protocol.html\fP dokumentiert ist.
.TP
\fBsql\fP
Gebe Prüfresultat als SQL Skript mit INSERT Befehlen aus. Ein Beispielskript, um die initiale SQL Tabelle zu erstellen ist unter create.sql zu finden.
.TP
\fBblacklist\fP
Für Cronjobs geeignet. Gibt das Prüfergebnis in eine Datei \fB~/.linkchecker/blacklist\fP aus, welche nur Einträge mit fehlerhaften URLs und die Anzahl der Fehlversuche enthält.
.TP
\fBnone\fP
Gibt nichts aus. Für Debugging oder Prüfen des Rückgabewerts geeignet.
.UNINDENT
.SH REGULÄRE AUSDRÜCKE
.sp
LinkChecker akzeptiert Pythons reguläre Ausdrücke. Siehe \fI\%https://docs.python.org/howto/regex.html\fP für eine Einführung. Eine Ergänzung ist, dass ein regulärer Ausdruck negiert wird falls er mit einem Ausrufezeichen beginnt.
.SH COOKIE-DATEIEN
.sp
Eine Cookie\-Datei enthält Standard HTTP\-Header (RFC 2616) mit den folgenden möglichen Namen:
.INDENT 0.0
.TP
\fBHost\fP (erforderlich)
Setzt die Domäne für die die Cookies gültig sind.
.TP
\fBPath\fP (optional)
Gibt den Pfad für den die Cookies gültig sind; Standardpfad ist \fB/\fP\&.
.TP
\fBSet\-cookie\fP (erforderlich)
Setzt den Cookie Name/Wert. Kann mehrmals angegeben werden.
.UNINDENT
.sp
Mehrere Einträge sind durch eine Leerzeile zu trennen. Das untige Beispiel sendet zwei Cookies zu allen URLs die mit \fBhttp://example.org/hello/\fP beginnen, und eins zu allen URLs die mit \fBhttps://example.org\fP beginnen:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Host: example.com
Path: /hello
Set\-cookie: ID="smee"
Set\-cookie: spam="egg"
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Host: example.org
Set\-cookie: baggage="elitist"; comment="hologram"
.ft P
.fi
.UNINDENT
.UNINDENT
.SH PROXY UNTERSTÜTZUNG
.sp
Um einen Proxy unter Unix oder Windows zu benutzen, setzen Sie die \fI\%http_proxy\fP, \fBhttps_proxy\fP oder \fI\%ftp_proxy\fP Umgebungsvariablen auf die Proxy URL. Die URL sollte die Form \fBhttp://\fP[\fIuser\fP\fB:\fP\fIpass\fP\fB@\fP]\fIhost\fP[\fB:\fP\fIport\fP] besitzen. LinkChecker erkennt auch die Proxy\-Einstellungen des Internet Explorers auf einem Windows\-System, und GNOME oder KDE auf Linux Systemen. Auf einem Mac benutzen Sie die Internet Konfiguration. Sie können eine komma\-separierte Liste von Domainnamen in der \fI\%no_proxy\fP Umgebungsvariable setzen, um alle Proxies für diese Domainnamen zu ignorieren.
.sp
Einen HTTP\-Proxy unter Unix anzugeben sieht beispielsweise so aus:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ export http_proxy="http://proxy.example.com:8080"
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Proxy\-Authentifizierung wird ebenfalls unterstützt:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ export http_proxy="http://user1:mypass@proxy.example.org:8081"
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Setzen eines Proxies unter der Windows Befehlszeile:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
C:\e> set http_proxy=http://proxy.example.com:8080
.ft P
.fi
.UNINDENT
.UNINDENT
.SH DURCHGEFÜHRTE PRÜFUNGEN
.sp
Alle URLs müssen einen ersten Syntaxtest bestehen. Kleine Kodierungsfehler ergeben eine Warnung, jede andere ungültige Syntaxfehler sind Fehler. Nach dem Bestehen des Syntaxtests wird die URL in die Schlange zum Verbindungstest gestellt. Alle Verbindungstests sind weiter unten beschrieben.
.INDENT 0.0
.TP
HTTP Verknüpfungen (\fBhttp:\fP, \fBhttps:\fP)
Nach Verbinden zu dem gegebenen HTTP\-Server wird der eingegebene Pfad oder Query angefordert. Alle Umleitungen werden verfolgt, und falls ein Benutzer/Passwort angegeben wurde werden diese falls notwendig als Authorisierung benutzt. Alle finalen HTTP Statuscodes, die nicht dem Muster 2xx entsprechen, werden als Fehler ausgegeben.
.sp
Der Inhalt von HTML\-Seiten wird rekursiv geprüft.
.TP
Lokale Dateien (\fBfile:\fP)
Eine reguläre, lesbare Datei die geöffnet werden kann ist gültig. Ein lesbares Verzeichnis ist ebenfalls gültig. Alle anderen Dateien, zum Beispiel Gerätedateien, unlesbare oder nicht existente Dateien ergeben einen Fehler.
.sp
HTML\- oder andere untersuchbare Dateiinhalte werden rekursiv geprüft.
.TP
Mail\-Links (\fBmailto:\fP)
Ein \fI\%mailto:\-Link\fP ergibt eine Liste von E\-Mail\-Adressen. Falls eine Adresse fehlerhaft ist, wird die ganze Liste als fehlerhaft angesehen. Für jede E\-Mail\-Adresse werden die folgenden Dinge geprüft:
.INDENT 7.0
.IP 1. 3
Check the address syntax, both the parts before and after the
@ sign.
.IP 2. 3
Look up the MX DNS records. If we found no MX record, print an
error.
.IP 3. 3
Check if one of the mail hosts accept an SMTP connection. Check
hosts with higher priority first. If no host accepts SMTP, we
print a warning.
.IP 4. 3
Try to verify the address with the VRFY command. If we got an
answer, print the verified address as an info.
.UNINDENT
.TP
FTP\-Links (\fBftp:\fP)
Für FTP\-Links wird Folgendes geprüft:
.INDENT 7.0
.IP 1. 3
Eine Verbindung zum angegeben Rechner wird aufgebaut
.IP 2. 3
Versuche, sich mit dem gegebenen Nutzer und Passwort anzumelden. Der Standardbenutzer ist \fBanonymous\fP, das Standardpasswort ist \fBanonymous@\fP\&.
.IP 3. 3
Versuche, in das angegebene Verzeichnis zu wechseln
.IP 4. 3
Liste die Dateien im Verzeichnis auf mit dem NLST\-Befehl
.UNINDENT
.TP
Telnet links (\fBtelnet:\fP)
Versuche, zu dem angegeben Telnetrechner zu verginden und falls Benutzer/Passwort angegeben sind, wird versucht, sich anzumelden.
.TP
NNTP links (\fBnews:\fP, \fBsnews:\fP, \fBnntp\fP)
Versuche, zu dem angegebenen NNTP\-Rechner eine Verbindung aufzubaucne. Falls eine Nachrichtengruppe oder ein bestimmter Artikel angegeben ist, wird versucht, diese Gruppe oder diesen Artikel vom Rechner anzufragen.
.TP
Nicht unterstützte Links (\fBjavascript:\fP, etc.)
Ein nicht unterstützter Link wird nur eine Warnung ausgeben. Weitere Prüfungen werden nicht durchgeführt.
.sp
Die komplette Liste von erkannten, aber nicht unterstützten Links ist in der Quelldatei \fI\%linkcheck/checker/unknownurl.py\fP\&. Die bekanntesten davon dürften JavaScript\-Links sein.
.UNINDENT
.SH PLUGINS
.sp
There are two plugin types: connection and content plugins. Connection
plugins are run after a successful connection to the URL host. Content
plugins are run if the URL type has content (mailto: URLs have no
content for example) and if the check is not forbidden (ie. by HTTP
robots.txt).
Use the option \fI\%\-\-list\-plugins\fP for a list of plugins and their
documentation. All plugins are enabled via the \fBlinkcheckerrc(5)\fP
configuration file.
.SH REKURSION
.sp
Bevor eine URL rekursiv geprüft wird, hat diese mehrere Bedingungen zu erfüllen. Diese werden in folgender Reihenfolge geprüft:
.INDENT 0.0
.IP 1. 3
Eine URL muss gültig sein.
.IP 2. 3
Der URL\-Inhalt muss analysierbar sein. Dies beinhaltet zur Zeit HTML\-Dateien, Opera Lesezeichen, und Verzeichnisse. Falls ein Dateityp nicht erkannt wird, (zum Beispiel weil er keine bekannte HTML\-Dateierweiterung besitzt, und der Inhalt nicht nach HTML aussieht), wird der Inhalt als nicht analysierbar angesehen.
.IP 3. 3
Der URL\-Inhalt muss ladbar sein. Dies ist normalerweise der Fall, mit Ausnahme von mailto: oder unbekannten URL\-Typen.
.IP 4. 3
Die maximale Rekursionstiefe darf nicht überschritten werden. Diese wird mit der Option \fI\%\-\-recursion\-level\fP konfiguriert und ist standardmäßig nicht limitiert.
.IP 5. 3
Die URL darf nicht in der Liste von ignorierten URLs sein. Die ignorierten URLs werden mit der Option \fI\%\-\-ignore\-url\fP konfiguriert.
.IP 6. 3
Das Robots Exclusion Protocol muss es erlauben, dass Verknüpfungen in der URL rekursiv verfolgt werden können. Dies wird geprüft, indem in den HTML Kopfdaten nach der "nofollow"\-Direktive gesucht wird.
.UNINDENT
.sp
Beachten Sie, dass die Verzeichnisrekursion alle Dateien in diesem Verzeichnis liest, nicht nur eine Untermenge wie bspw. \fBindex.htm\fP\&.
.SH BEMERKUNGEN
.sp
URLs von der Kommandozeile die mit \fBftp.\fP beginnen werden wie \fBftp://ftp.\fP behandelt, URLs die mit \fBwww.\fP beginnen wie \fBhttp://www.\fP\&. Sie können auch lokale Dateien angeben. Falls sich Ihr System automatisch mit dem Internet verbindet (z.B. mit diald), wird es dies tun wenn Sie Links prüfen, die nicht auf Ihren lokalen Rechner verweisen Benutzen Sie die Option \fI\%\-\-ignore\-url\fP, um dies zu verhindern.
.sp
Javascript Links werden nicht unterstützt.
.sp
Wenn Ihr System keine Threads unterstützt, deaktiviert diese LinkChecker automatisch.
.sp
Sie können mehrere Benutzer/Passwort Paare in einer Konfigurationsdatei angeben.
.sp
Beim Prüfen von \fBnews:\fP Links muß der angegebene NNTP Rechner nicht unbedingt derselbe wie der des Benutzers sein.
.SH UMGEBUNG
.INDENT 0.0
.TP
.B NNTP_SERVER
gibt Standard NNTP Server an
.UNINDENT
.INDENT 0.0
.TP
.B http_proxy
gibt Standard HTTP Proxy an
.UNINDENT
.INDENT 0.0
.TP
.B ftp_proxy
gibt Standard FTP Proxy an
.UNINDENT
.INDENT 0.0
.TP
.B no_proxy
kommaseparierte Liste von Domains, die nicht über einen Proxy\-Server kontaktiert werden
.UNINDENT
.INDENT 0.0
.TP
.B LC_MESSAGES, LANG, LANGUAGE
gibt Ausgabesprache an
.UNINDENT
.SH RÜCKGABEWERT
.sp
Der Rückgabewert ist 2 falls
.INDENT 0.0
.IP \(bu 2
ein Programmfehler aufgetreten ist.
.UNINDENT
.sp
Der Rückgabewert ist 1 falls
.INDENT 0.0
.IP \(bu 2
ungültige Verknüpfungen gefunden wurden oder
.IP \(bu 2
Warnungen gefunden wurden und Warnungen aktiviert sind
.UNINDENT
.sp
Sonst ist der Rückgabewert Null.
.SH LIMITIERUNGEN
.sp
LinkChecker benutzt Hauptspeicher für jede zu prüfende URL, die in der Warteschlange steht. Mit tausenden solcher URLs kann die Menge des benutzten Hauptspeichers sehr groß werden. Dies könnte das Programm oder sogar das gesamte System verlangsamen.
.SH DATEIEN
.sp
\fB~/.linkchecker/linkcheckerrc\fP \- Standardkonfigurationsdatei
.sp
\fB~/.linkchecker/blacklist\fP \- Standard Dateiname der blacklist Logger Ausgabe
.sp
\fBlinkchecker\-out.\fP\fITYP\fP \- Standard Dateiname der Logausgabe
.SH SIEHE AUCH
.sp
\fBlinkcheckerrc(5)\fP
.sp
\fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP \- gültige Ausgabe Enkodierungen
.sp
\fI\%https://docs.python.org/howto/regex.html\fP \- Dokumentation zu regulären Ausdrücken
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>
.SH COPYRIGHT
2000-2014 Bastian Kleineidam
.\" Generated by docutils manpage writer.
.

584
doc/man/de/linkcheckerrc.5 Normal file
View file

@ -0,0 +1,584 @@
.\" Man page generated from reStructuredText.
.
.TH "LINKCHECKERRC" "5" "August 11, 2020" "" "LinkChecker"
.SH NAME
linkcheckerrc \- Konfigurationsdatei für LinkChecker
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH BESCHREIBUNG
.sp
\fBlinkcheckerrc\fP ist die Konfigurationsdatei für LinkChecker. Die Datei ist in einem INI\-Format geschrieben. Die Standarddatei ist \fB~/.linkchecker/linkcheckerrc\fP unter Unix\-, \fB%HOMEPATH%\elinkchecker\elinkcheckerrc\fP unter Windows\-Systemen.
.SH EIGENSCHAFTEN
.SS checking
.INDENT 0.0
.TP
\fBcookiefile=\fP\fIDateiname\fP
Lese eine Datei mit Cookie\-Daten. Das Cookie Datenformat wird in \fBlinkchecker(1)\fP erklärt. Kommandozeilenoption: \fB\-\-cookiefile\fP
.TP
\fBdebugmemory=\fP[\fB0\fP|\fB1\fP]
Write memory allocation statistics to a file on exit, requires \fI\%meliae\fP\&.
The default is not to write the file.
Command line option: none
.TP
\fBlocalwebroot=\fP\fISTRING\fP
Beachten Sie dass das angegebene Verzeichnis in URL\-Syntax sein muss, d.h. es muss einen normalen statt einen umgekehrten Schrägstrich zum Aneinanderfügen von Verzeichnissen benutzen. Und das angegebene Verzeichnis muss mit einem Schrägstrich enden. Kommandozeilenoption: none
.TP
\fBnntpserver=\fP\fISTRING\fP
Gibt ein NNTP Rechner für \fBnews:\fP Links. Standard ist die Umgebungsvariable \fBNNTP_SERVER\fP\&. Falls kein Rechner angegeben ist, wird lediglich auf korrekte Syntax des Links geprüft. Kommandozeilenoption: \fB\-\-nntp\-server\fP
.TP
\fBrecursionlevel=\fP\fINUMMER\fP
Prüfe rekursiv alle URLs bis zu der angegebenen Tiefe. Eine negative Tiefe bewirkt unendliche Rekursion. Standard Tiefe ist unendlich. Kommandozeilenoption: \fB\-\-recursion\-level\fP
.TP
\fBthreads=\fP\fINUMMER\fP
Generiere nicht mehr als die angegebene Anzahl von Threads. Die Standardanzahl von Threads ist 10. Um Threads zu deaktivieren, geben Sie eine nicht positive Nummer an. Kommandozeilenoption: \fB\-\-threads\fP
.TP
\fBtimeout=\fP\fINUMMER\fP
Setze den Timeout für TCP\-Verbindungen in Sekunden. Der Standard Timeout ist 60 Sekunden. Kommandozeilenoption: \fB\-\-timeout\fP
.TP
\fBaborttimeout=\fP\fINUMMER\fP
Time to wait for checks to finish after the user aborts the first
time (with Ctrl\-C or the abort button). The default abort timeout is
300 seconds.
Command line option: \fB\-\-timeout\fP
.TP
\fBuseragent=\fP\fISTRING\fP
Gibt den User\-Agent an, der zu HTTP\-Servern geschickt wird, z.B. "Mozilla/4.0". Der Standard ist "LinkChecker/X.Y", wobei X.Y die aktuelle Version von LinkChecker ist. Kommandozeilenoption: \fB\-\-user\-agent\fP
.TP
\fBsslverify=\fP[\fB0\fP|\fB1\fP|\fIfilename\fP]
Falls der Wert Null ist werden SSL Zertifikate nicht überprüft. Falls er auf Eins gesetzt wird (der Standard) werden SSL Zertifikate mit der gelieferten CA Zertifikatsdatei geprüft. Falls ein Dateiname angegeben ist wird dieser zur Prüfung verwendet. Kommandozeilenoption: none
.TP
\fBmaxrunseconds=\fP\fINUMMER\fP
Hört nach der angegebenen Anzahl von Sekunden auf, neue URLs zu prüfen. Dies ist dasselbe als wenn der Benutzer nach der gegebenen Anzahl von Sekunden stoppt (durch Drücken von Strg\-C). Kommandozeilenoption: none
.TP
\fBmaxfilesizedownload=\fP\fINUMBER\fP
Files larger than NUMBER bytes will be ignored, without downloading anything
if accessed over http and an accurate Content\-Length header was returned.
No more than this amount of a file will be downloaded.
The default is 5242880 (5 MB).
Command line option: none
.TP
\fBmaxfilesizeparse=\fP\fINUMBER\fP
Files larger than NUMBER bytes will not be parsed for links.
The default is 1048576 (1 MB).
Command line option: none
.TP
\fBmaxnumurls=\fP\fINUMMER\fP
Maximale Anzahl von URLs die geprüft werden. Neue URLs werden nicht angenommen nachdem die angegebene Anzahl von URLs geprüft wurde. Kommandozeilenoption: none
.TP
\fBmaxrequestspersecond=\fP\fINUMMER\fP
Limit the maximum number of requests per second to one host.
The default is 10.
Command line option: none
.TP
\fBrobotstxt=\fP[\fB0\fP|\fB1\fP]
When using http, fetch robots.txt, and confirm whether each URL should
be accessed before checking.
The default is to use robots.txt files.
Command line option: \fB\-\-no\-robots\fP
.TP
\fBallowedschemes=\fP\fINAME\fP[\fB,\fP\fINAME\fP\&...]
Allowed URL schemes as comma\-separated list.
Command line option: none
.UNINDENT
.SS filtering
.INDENT 0.0
.TP
\fBignore=\fP\fIREGEX\fP (MULTILINE)
Prüfe lediglich die Syntax von URLs, welche dem angegebenen regulären Ausdruck entsprechen. Kommandozeilenoption: \fB\-\-ignore\-url\fP
.TP
\fBignorewarnings=\fP\fINAME\fP[\fB,\fP\fINAME\fP\&...]
Ignoriere die kommagetrennte Liste von Warnungen. Siehe \fI\%WARNINGS\fP für die Liste von erkannten Warnungen. Kommandozeilenoption: none
.TP
\fBinternlinks=\fP\fIREGEX\fP
Regulärer Ausdruck, um mehr URLs als interne Verknüpfungen hinzuzufügen. Standard ist dass URLs der Kommandozeile als intern gelten. Kommandozeilenoption: none
.TP
\fBnofollow=\fP\fIREGEX\fP (MULTILINE)
Prüfe URLs die auf den regulären Ausdruck zutreffen, aber führe keine Rekursion durch. Kommandozeilenoption: \fB\-\-no\-follow\-url\fP
.TP
\fBcheckextern=\fP[\fB0\fP|\fB1\fP]
Check external links. Default is to check internal links only.
Command line option: \fB\-\-check\-extern\fP
.UNINDENT
.SS authentication
.INDENT 0.0
.TP
\fBentry=\fP\fIREGEX\fP \fIBENUTZER\fP [\fIPASSWORT\fP] (MULTILINE)
Provide individual username/password pairs for different links. In
addtion to a single login page specified with \fBloginurl\fP multiple
FTP, HTTP (Basic Authentication) and telnet links are supported.
Entries are a triple (URL regex, username, password) or a tuple (URL
regex, username), where the entries are separated by whitespace.
The password is optional and if missing it has to be entered at the
commandline.
If the regular expression matches the checked URL, the given
username/password pair is used for authentication. The command line
options \fB\-u\fP and \fB\-p\fP match every link and therefore override
the entries given here. The first match wins.
Command line option: \fB\-u\fP, \fB\-p\fP
.TP
\fBloginurl=\fP\fIURL\fP
The URL of a login page to be visited before link checking. The page
is expected to contain an HTML form to collect credentials and
submit them to the address in its action attribute using an HTTP
POST request. The name attributes of the input elements of the form
and the values to be submitted need to be available (see \fBentry\fP
for an explanation of username and password values).
.TP
\fBloginuserfield=\fP\fISTRING\fP
Der Name für das Benutzer CGI\-Feld. Der Standardname ist \fBlogin\fP\&.
.TP
\fBloginpasswordfield=\fP\fISTRING\fP
Der Name für das Passwort CGI\-Feld. Der Standardname ist \fBpassword\fP\&.
.TP
\fBloginextrafields=\fP\fINAME\fP\fB:\fP\fIWERT\fP (MULTILINE)
Optionally the name attributes of any additional input elements and
the values to populate them with. Note that these are submitted
without checking whether matching input elements exist in the HTML
form.
.UNINDENT
.SS output
.INDENT 0.0
.TP
\fBdebug=\fP\fISTRING\fP[\fB,\fP\fISTRING\fP\&...]
Gebe Testmeldungen aus für den angegebenen Logger. Verfügbare Logger sind \fBcmdline\fP, \fBchecking\fP, \fBcache\fP, \fBdns\fP, \fBthread\fP, \fBplugins\fP und \fBall\fP\&. Die Angabe \fBall\fP ist ein Synonym für alle verfügbaren Logger. Kommandozeilenoption: \fB\-\-debug\fP
.TP
\fBfileoutput=\fP\fITYPE\fP[\fB,\fP\fITYPE\fP\&...]
Ausgabe in Datei \fBlinkchecker\-out.\fP\fITYP\fP, \fB$HOME/.linkchecker/blacklist\fP für \fBblacklist\fP Ausgabe. Gültige Ausgabearten sind \fBtext\fP, \fBhtml\fP, \fBsql\fP, \fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBnone\fP> oder \fBblacklist\fP Standard ist keine Dateiausgabe. Die verschiedenen Ausgabearten sind unten dokumentiert. Bemerke, dass man alle Konsolenausgaben mit \fBoutput=none\fP unterdrücken kann. Kommandozeilenoption: \fB\-\-file\-output\fP
.TP
\fBlog=\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP]
Gib Ausgabetyp als \fBtext\fP, \fBhtml\fP, \fBsql\fP, \fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBnone\fP oder \fBblacklist\fP an. Stadard Typ ist \fBtext\fP\&. Die verschiedenen Ausgabetypen sind unten dokumentiert. Das \fIENCODING\fP gibt die Ausgabekodierung an. Der Standard ist das der lokalen Spracheinstellung. Gültige Enkodierungen sind aufgelistet unter \fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP\&. Kommandozeilenoption: \fB\-\-output\fP
.TP
\fBquiet=\fP[\fB0\fP|\fB1\fP]
Falls gesetzt, erfolgt keine Ausgabe. Ein Alias für \fBlog=none\fP\&. Dies ist nur in Verbindung mit \fBfileoutput\fP nützlich. Kommandozeilenoption: \fB\-\-verbose\fP
.TP
\fBstatus=\fP[\fB0\fP|\fB1\fP]
Kontrolle der Statusmeldungen. Standard ist 1. Kommandozeilenoption: \fB\-\-no\-status\fP
.TP
\fBverbose=\fP[\fB0\fP|\fB1\fP]
Falls gesetzt, gebe alle geprüften URLs einmal aus. Standard ist es, nur fehlerhafte URLs und Warnungen auszugeben. Kommandozeilenoption: \fB\-\-verbose\fP
.TP
\fBwarnings=\fP[\fB0\fP|\fB1\fP]
Falls gesetzt, gebe keine Warnungen aus. Standard ist die Ausgabe von Warnungen. Kommandozeilenoption: \fB\-\-verbose\fP
.UNINDENT
.SS text
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Gebe Dateiname für Textausgabe an. Standard Dateiname ist \fBlinkchecker\-out.txt\fP\&. Kommandozeilenoption: \fB\-\-file\-output\fP
.TP
\fBparts=\fP\fISTRING\fP
Kommagetrennte Liste von Teilen, die ausgegeben werden sollen. Siehe \fI\%LOGGER PARTS\fP weiter unten. Kommandozeilenoption: none
.TP
\fBencoding=\fP\fISTRING\fP
Gültige Enkodierungen sind aufgelistet unter \fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP\&. Die Standardenkodierung ist \fBiso\-8859\-15\fP\&.
.TP
.B \fIcolor*\fP
Farbwerte für die verschiedenen Ausgabeteile. Syntax ist \fIcolor\fP oder \fItype\fP\fB;\fP\fIcolor\fP\&. Der \fItype\fP kann \fBbold\fP, \fBlight\fP, \fBblink\fP> oder \fBinvert\fP sein. Die \fIcolor\fP kann \fBdefault\fP, \fBblack\fP, \fBred\fP, \fBgreen\fP, \fByellow\fP, \fBblue\fP, \fBpurple\fP, \fBcyan\fP, \fBwhite\fP, \fBBlack\fP, \fBRed\fP, \fBGreen\fP, \fBYellow\fP, \fBBlue\fP, \fBPurple\fP, \fBCyan\fP oder \fBWhite\fP sein. Kommandozeilenoption: none
.TP
\fBcolorparent=\fP\fISTRING\fP
Setze Farbe des Vaters. Standard ist \fBwhite\fP\&.
.TP
\fBcolorurl=\fP\fISTRING\fP
Setze URL Farbe. Standard ist \fBdefault\fP\&.
.TP
\fBcolorname=\fP\fISTRING\fP
Setze Namensfarbe. Standard ist \fBdefault\fP\&.
.TP
\fBcolorreal=\fP\fISTRING\fP
Setze Farbe für tatsächliche URL. Default ist \fBcyan\fP\&.
.TP
\fBcolorbase=\fP\fISTRING\fP
Setzt Basisurl Farbe. Standard ist \fBpurple\fP\&.
.TP
\fBcolorvalid=\fP\fISTRING\fP
Setze gültige Farbe. Standard ist \fBbold;green\fP\&.
.TP
\fBcolorinvalid=\fP\fISTRING\fP
Setze ungültige Farbe. Standard ist \fBbold;red\fP\&.
.TP
\fBcolorinfo=\fP\fISTRING\fP
Setzt Informationsfarbe. Standard ist \fBdefault\fP\&.
.TP
\fBcolorwarning=\fP\fISTRING\fP
Setze Warnfarbe. Standard ist \fBbold;yellow\fP\&.
.TP
\fBcolordltime=\fP\fISTRING\fP
Setze Downloadzeitfarbe. Standard ist \fBdefault\fP\&.
.TP
\fBcolorreset=\fP\fISTRING\fP
Setze Reset Farbe. Standard ist \fBdefault\fP\&.
.UNINDENT
.SS gml
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.UNINDENT
.SS dot
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.UNINDENT
.SS csv
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBseparator=\fP\fICHAR\fP
Das CSV Trennzeichen. Standard ist Komma (\fB,\fP).
.TP
\fBquotechar=\fP\fICHAR\fP
Setze CSV Quotezeichen. Standard ist das doppelte Anführungszeichen (\fB"\fP).
.UNINDENT
.SS sql
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBdbname=\fP\fISTRING\fP
Setze Datenbankname zum Speichern. Standard ist \fBlinksdb\fP\&.
.TP
\fBseparator=\fP\fICHAR\fP
Setze SQL Kommandotrennzeichen. Standard ist ein Strichpunkt (\fB;\fP).
.UNINDENT
.SS html
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBcolorbackground=\fP\fICOLOR\fP
Setze HTML Hintergrundfarbe. Standard ist \fB#fff7e5\fP\&.
.TP
\fBcolorurl=\fP
Setze HTML URL Farbe. Standard ist \fB#dcd5cf\fP\&.
.TP
\fBcolorborder=\fP
Setze HTML Rahmenfarbe. Standard ist \fB#000000\fP\&.
.TP
\fBcolorlink=\fP
Setze HTML Verknüpfungsfarbe. Standard ist \fB#191c83\fP\&.
.TP
\fBcolorwarning=\fP
Setze HTML Warnfarbe. Standard ist \fB#e0954e\fP\&.
.TP
\fBcolorerror=\fP
Setze HTML Fehlerfarbe. Standard ist \fB#db4930\fP\&.
.TP
\fBcolorok=\fP
Setze HTML Gültigkeitsfarbe. Standard ist \fB#3ba557\fP\&.
.UNINDENT
.SS blacklist
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.UNINDENT
.SS xml
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.UNINDENT
.SS gxml
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.UNINDENT
.SS sitemap
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBparts=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBencoding=\fP\fISTRING\fP
Siehe \fI\%[text]\fP Sektion weiter oben.
.TP
\fBpriority=\fP\fINUMMER\fP
Eine Nummer zwischen 0.0 und 1.0, welche die Priorität festlegt. Die Standardpriorität für die erste URL ist 1.0, für alle Kind\-URLs ist sie 0.5.
.TP
\fBfrequency=\fP[\fBalways\fP|\fBhourly\fP|\fBdaily\fP|\fBweekly\fP|\fBmonthly\fP|\fByearly\fP|\fBnever\fP]
Die Häufigkeit mit der Seiten sich ändern.
.UNINDENT
.SH AUSGABE PARTS
.INDENT 0.0
.TP
\fBall\fP
for all parts
.TP
\fBid\fP
a unique ID for each logentry
.TP
\fBrealurl\fP
the full url link
.TP
\fBresult\fP
valid or invalid, with messages
.TP
\fBextern\fP
1 or 0, only in some logger types reported
.TP
\fBbase\fP
base href=...
.TP
\fBname\fP
<a href=...>name</a> and <img alt="name">
.TP
\fBparenturl\fP
if any
.TP
\fBinfo\fP
some additional info, e.g. FTP welcome messages
.TP
\fBwarning\fP
warnings
.TP
\fBdltime\fP
download time
.TP
\fBchecktime\fP
check time
.TP
\fBurl\fP
the original url name, can be relative
.TP
\fBintro\fP
the blurb at the beginning, "starting at ..."
.TP
\fBoutro\fP
the blurb at the end, "found x errors ..."
.UNINDENT
.SH MULTILINE
.sp
Einige Optionen können mehrere Zeilen lang sein. Jede Zeile muss dafür eingerückt werden. Zeilen die mit einer Raute (\fB#\fP) beginnen werden ignoriert, müssen aber eingerückt sein.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
ignore=
lconline
bookmark
# a comment
^mailto:
.ft P
.fi
.UNINDENT
.UNINDENT
.SH BEISPIEL
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
[output]
log=html
[checking]
threads=5
[filtering]
ignorewarnings=http\-moved\-permanent
.ft P
.fi
.UNINDENT
.UNINDENT
.SH PLUGINS
.sp
All plugins have a separate section. If the section appears in the
configuration file the plugin is enabled. Some plugins read extra
options in their section.
.SS AnchorCheck
.sp
Checks validity of HTML anchors.
.SS LocationInfo
.sp
Adds the country and if possible city name of the URL host as info.
Needs GeoIP or pygeoip and a local country or city lookup DB installed.
.SS RegexCheck
.sp
Definieren Sie einen regulären Ausdruck der eine Warnung ausgibt falls er auf den Inhalt einer geprüften URL zutrifft. Dies gilt nur für gültige Seiten deren Inhalt wir bekommen können.
.INDENT 0.0
.TP
\fBwarningregex=\fP\fIREGEX\fP
Use this to check for pages that contain some form of error message,
for example "This page has moved" or "Oracle Application error".
\fIREGEX\fP should be unquoted.
.sp
Man beachte, dass mehrere Werte in dem regulären Ausdruck kombiniert werden können, zum Beispiel "(Diese Seite ist umgezogen|Oracle Applikationsfehler)".
.UNINDENT
.SS SslCertificateCheck
.sp
Check SSL certificate expiration date. Only internal https: links will
be checked. A domain will only be checked once to avoid duplicate
warnings.
.INDENT 0.0
.TP
\fBsslcertwarndays=\fP\fINUMMER\fP
Configures the expiration warning time in days.
.UNINDENT
.SS HtmlSyntaxCheck
.sp
Prüfe Syntax von HTML URLs mit dem W3C Online Validator. Siehe \fI\%https://validator.w3.org/docs/api.html\fP\&.
.SS HttpHeaderInfo
.sp
Print HTTP headers in URL info.
.INDENT 0.0
.TP
\fBprefixes=\fP\fIprefix1\fP[,*prefix2*]...
List of comma separated header prefixes. For example to display all
HTTP headers that start with "X\-".
.UNINDENT
.SS CssSyntaxCheck
.sp
Prüfe Syntax von HTML URLs mit dem W3C Online Validator. Siehe \fI\%https://jigsaw.w3.org/css\-validator/manual.html#expert\fP\&.
.SS VirusCheck
.sp
Checks the page content for virus infections with clamav. A local clamav
daemon must be installed.
.INDENT 0.0
.TP
\fBclamavconf=\fP\fIDateiname\fP
Dateiname von \fBclamd.conf\fP Konfigurationsdatei.
.UNINDENT
.SS PdfParser
.sp
Parse PDF files for URLs to check. Needs the \fI\%pdfminer\fP Python package
installed.
.SS WordParser
.sp
Parse Word files for URLs to check. Needs the \fI\%pywin32\fP Python
extension installed.
.SH WARNUNGEN
.sp
Die folgenden Warnungen werden vom Konfigurationseintrag \(aqignorewarnings\(aq erkannt:
.INDENT 0.0
.TP
\fBfile\-missing\-slash\fP
Der file: URL fehlt ein abschließender Schrägstrich.
.TP
\fBfile\-system\-path\fP
Der file: Pfad ist nicht derselbe wie der Systempfad.
.TP
\fBftp\-missing\-slash\fP
Der ftp: URL fehlt ein abschließender Schrägstrich.
.TP
\fBhttp\-cookie\-store\-error\fP
Ein Fehler trat auf während des Speicherns eines Cookies.
.TP
\fBhttp\-empty\-content\fP
Die URL besitzt keinen Inhalt.
.TP
\fBmail\-no\-mx\-host\fP
Der MX Mail\-Rechner konnte nicht gefunden werden.
.TP
\fBnntp\-no\-newsgroup\fP
Die NNTP Nachrichtengruppe konnte nicht gefunden werden.
.TP
\fBnntp\-no\-server\fP
Es wurde kein NNTP Server gefunden.
.TP
\fBurl\-content\-size\-zero\fP
Der URL Inhaltsgrößenangabe ist Null.
.TP
\fBurl\-content\-too\-large\fP
Der URL Inhalt ist zu groß.
.TP
\fBurl\-effective\-url\fP
Die effektive URL unterscheidet sich vom Original.
.TP
\fBurl\-error\-getting\-content\fP
Konnte den Inhalt der URL nicht bekommen.
.TP
\fBurl\-obfuscated\-ip\fP
Die IP\-Adresse ist verschleiert.
.TP
\fBurl\-whitespace\fP
Die URL %(url)s enthält Leerzeichen am Anfang oder Ende.
.UNINDENT
.SH SIEHE AUCH
.sp
\fBlinkchecker(1)\fP
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>
.SH COPYRIGHT
2000-2014 Bastian Kleineidam
.\" Generated by docutils manpage writer.
.

685
doc/man/en/linkchecker.1 Normal file
View file

@ -0,0 +1,685 @@
.\" Man page generated from reStructuredText.
.
.TH "LINKCHECKER" "1" "August 11, 2020" "" "LinkChecker"
.SH NAME
linkchecker \- command line client to check HTML documents and websites for broken links
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH SYNOPSIS
.sp
\fBlinkchecker\fP [\fIoptions\fP] [\fIfile\-or\-url\fP]...
.SH DESCRIPTION
.sp
LinkChecker features
.INDENT 0.0
.IP \(bu 2
recursive and multithreaded checking
.IP \(bu 2
output in colored or normal text, HTML, SQL, CSV, XML or a sitemap
graph in different formats
.IP \(bu 2
support for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and
local file links
.IP \(bu 2
restriction of link checking with URL filters
.IP \(bu 2
proxy support
.IP \(bu 2
username/password authorization for HTTP, FTP and Telnet
.IP \(bu 2
support for robots.txt exclusion protocol
.IP \(bu 2
support for Cookies
.IP \(bu 2
support for HTML5
.IP \(bu 2
HTML and CSS syntax check
.IP \(bu 2
Antivirus check
.IP \(bu 2
a command line and web interface
.UNINDENT
.SH EXAMPLES
.sp
The most common use checks the given domain recursively:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker http://www.example.com/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Beware that this checks the whole site which can have thousands of
URLs. Use the \fI\%\-r\fP option to restrict the recursion depth.
.sp
Don\(aqt check URLs with \fB/secret\fP in its name. All other links are
checked as usual:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker \-\-ignore\-url=/secret mysite.example.com
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Checking a local HTML file on Unix:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker ../bla.html
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Checking a local HTML file on Windows:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
C:\e> linkchecker c:empest.html
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You can skip the \fBhttp://\fP url part if the domain starts with
\fBwww.\fP:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker www.example.com
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You can skip the \fBftp://\fP url part if the domain starts with \fBftp.\fP:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker \-r0 ftp.example.com
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Generate a sitemap graph and convert it with the graphviz dot utility:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ linkchecker \-odot \-v www.example.com | dot \-Tps > sitemap.ps
.ft P
.fi
.UNINDENT
.UNINDENT
.SH OPTIONS
.SS General options
.INDENT 0.0
.TP
.B \-f FILENAME, \-\-config=FILENAME
Use FILENAME as configuration file. By default LinkChecker uses
~/.linkchecker/linkcheckerrc.
.UNINDENT
.INDENT 0.0
.TP
.B \-h, \-\-help
Help me! Print usage information for this program.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-stdin
Read list of white\-space separated URLs to check from stdin.
.UNINDENT
.INDENT 0.0
.TP
.B \-t NUMBER, \-\-threads=NUMBER
Generate no more than the given number of threads. Default number of
threads is 10. To disable threading specify a non\-positive number.
.UNINDENT
.INDENT 0.0
.TP
.B \-V, \-\-version
Print version and exit.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-list\-plugins
Print available check plugins and exit.
.UNINDENT
.SS Output options
.INDENT 0.0
.TP
.B \-D STRING, \-\-debug=STRING
Print debugging output for the given logger. Available loggers are
cmdline, checking, cache, dns, plugin and
all. Specifying all is an alias for specifying all available
loggers. The option can be given multiple times to debug with more
than one logger. For accurate results, threading will be disabled
during debug runs.
.UNINDENT
.INDENT 0.0
.TP
.B \-F TYPE[/ENCODING][/FILENAME], \-\-file\-output=TYPE[/ENCODING][/FILENAME]
Output to a file linkchecker\-out.TYPE,
$HOME/.linkchecker/blacklist for blacklist output, or
FILENAME if specified. The ENCODING specifies the output
encoding, the default is that of your locale. Valid encodings are
listed at
\fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP\&.
The FILENAME and ENCODING parts of the none output type will
be ignored, else if the file already exists, it will be overwritten.
You can specify this option more than once. Valid file output TYPEs
are text, html, sql, csv, gml, dot, xml,
sitemap, none or blacklist. Default is no file output.
The various output types are documented below. Note that you can
suppress all console output with the option \fI\%\-o\fP \fInone\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-no\-status
Do not print check status messages.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-no\-warnings
Don\(aqt log warnings. Default is to log warnings.
.UNINDENT
.INDENT 0.0
.TP
.B \-o TYPE[/ENCODING], \-\-output=TYPE[/ENCODING]
Specify output type as text, html, sql, csv,
gml, dot, xml, sitemap, none or blacklist.
Default type is text. The various output types are documented
below.
The ENCODING specifies the output encoding, the default is that of
your locale. Valid encodings are listed at
\fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-q, \-\-quiet
Quiet operation, an alias for \fI\%\-o\fP \fInone\fP\&. This is only useful with
\fI\%\-F\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-v, \-\-verbose
Log all checked URLs. Default is to log only errors and warnings.
.UNINDENT
.INDENT 0.0
.TP
.B \-W REGEX, \-\-warning\-regex=REGEX
Define a regular expression which prints a warning if it matches any
content of the checked link. This applies only to valid pages, so we
can get their content.
Use this to check for pages that contain some form of error, for
example "This page has moved" or "Oracle Application error".
Note that multiple values can be combined in the regular expression,
for example "(This page has moved|Oracle Application error)".
See section \fI\%REGULAR EXPRESSIONS\fP for more info.
.UNINDENT
.SS Checking options
.INDENT 0.0
.TP
.B \-\-cookiefile=FILENAME
Read a file with initial cookie data. The cookie data format is
explained below.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-check\-extern
Check external URLs.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-ignore\-url=REGEX
URLs matching the given regular expression will only be syntax checked.
This option can be given multiple times.
See section \fI\%REGULAR EXPRESSIONS\fP for more info.
.UNINDENT
.INDENT 0.0
.TP
.B \-N STRING, \-\-nntp\-server=STRING
Specify an NNTP server for news: links. Default is the
environment variable \fI\%NNTP_SERVER\fP\&. If no host is given, only the
syntax of the link is checked.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-no\-follow\-url=REGEX
Check but do not recurse into URLs matching the given regular
expression.
This option can be given multiple times.
See section \fI\%REGULAR EXPRESSIONS\fP for more info.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-no\-robots
Check URLs regardless of any robots.txt files.
.UNINDENT
.INDENT 0.0
.TP
.B \-p, \-\-password
Read a password from console and use it for HTTP and FTP
authorization. For FTP the default password is anonymous@. For
HTTP there is no default password. See also \fI\%\-u\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-r NUMBER, \-\-recursion\-level=NUMBER
Check recursively all links up to given depth. A negative depth will
enable infinite recursion. Default depth is infinite.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-timeout=NUMBER
Set the timeout for connection attempts in seconds. The default
timeout is 60 seconds.
.UNINDENT
.INDENT 0.0
.TP
.B \-u STRING, \-\-user=STRING
Try the given username for HTTP and FTP authorization. For FTP the
default username is anonymous. For HTTP there is no default
username. See also \fI\%\-p\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B \-\-user\-agent=STRING
Specify the User\-Agent string to send to the HTTP server, for
example "Mozilla/4.0". The default is "LinkChecker/X.Y" where X.Y is
the current version of LinkChecker.
.UNINDENT
.SH CONFIGURATION FILES
.sp
Configuration files can specify all options above. They can also specify
some options that cannot be set on the command line. See
\fBlinkcheckerrc(5)\fP for more info.
.SH OUTPUT TYPES
.sp
Note that by default only errors and warnings are logged. You should use
the option \fI\%\-\-verbose\fP to get the complete URL list, especially when
outputting a sitemap graph format.
.INDENT 0.0
.TP
\fBtext\fP
Standard text logger, logging URLs in keyword: argument fashion.
.TP
\fBhtml\fP
Log URLs in keyword: argument fashion, formatted as HTML.
Additionally has links to the referenced pages. Invalid URLs have
HTML and CSS syntax check links appended.
.TP
\fBcsv\fP
Log check result in CSV format with one URL per line.
.TP
\fBgml\fP
Log parent\-child relations between linked URLs as a GML sitemap
graph.
.TP
\fBdot\fP
Log parent\-child relations between linked URLs as a DOT sitemap
graph.
.TP
\fBgxml\fP
Log check result as a GraphXML sitemap graph.
.TP
\fBxml\fP
Log check result as machine\-readable XML.
.TP
\fBsitemap\fP
Log check result as an XML sitemap whose protocol is documented at
\fI\%https://www.sitemaps.org/protocol.html\fP\&.
.TP
\fBsql\fP
Log check result as SQL script with INSERT commands. An example
script to create the initial SQL table is included as create.sql.
.TP
\fBblacklist\fP
Suitable for cron jobs. Logs the check result into a file
\fB~/.linkchecker/blacklist\fP which only contains entries with
invalid URLs and the number of times they have failed.
.TP
\fBnone\fP
Logs nothing. Suitable for debugging or checking the exit code.
.UNINDENT
.SH REGULAR EXPRESSIONS
.sp
LinkChecker accepts Python regular expressions. See
\fI\%https://docs.python.org/howto/regex.html\fP for an introduction.
An addition is that a leading exclamation mark negates the regular
expression.
.SH COOKIE FILES
.sp
A cookie file contains standard HTTP header (RFC 2616) data with the
following possible names:
.INDENT 0.0
.TP
\fBHost\fP (required)
Sets the domain the cookies are valid for.
.TP
\fBPath\fP (optional)
Gives the path the cookies are value for; default path is \fB/\fP\&.
.TP
\fBSet\-cookie\fP (required)
Set cookie name/value. Can be given more than once.
.UNINDENT
.sp
Multiple entries are separated by a blank line. The example below will
send two cookies to all URLs starting with \fBhttp://example.com/hello/\fP
and one to all URLs starting with \fBhttps://example.org/\fP:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Host: example.com
Path: /hello
Set\-cookie: ID="smee"
Set\-cookie: spam="egg"
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Host: example.org
Set\-cookie: baggage="elitist"; comment="hologram"
.ft P
.fi
.UNINDENT
.UNINDENT
.SH PROXY SUPPORT
.sp
To use a proxy on Unix or Windows set the \fI\%http_proxy\fP, \fBhttps_proxy\fP or
\fI\%ftp_proxy\fP environment variables to the proxy URL. The URL should be of
the form
\fBhttp://\fP[\fIuser\fP\fB:\fP\fIpass\fP\fB@\fP]\fIhost\fP[\fB:\fP\fIport\fP].
LinkChecker also detects manual proxy settings of Internet Explorer
under Windows systems, and GNOME or KDE on Linux systems. On a Mac use
the Internet Config to select a proxy.
You can also set a comma\-separated domain list in the \fI\%no_proxy\fP
environment variables to ignore any proxy settings for these domains.
.sp
Setting a HTTP proxy on Unix for example looks like this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ export http_proxy="http://proxy.example.com:8080"
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Proxy authentication is also supported:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ export http_proxy="http://user1:mypass@proxy.example.org:8081"
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Setting a proxy on the Windows command prompt:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
C:\e> set http_proxy=http://proxy.example.com:8080
.ft P
.fi
.UNINDENT
.UNINDENT
.SH PERFORMED CHECKS
.sp
All URLs have to pass a preliminary syntax test. Minor quoting mistakes
will issue a warning, all other invalid syntax issues are errors. After
the syntax check passes, the URL is queued for connection checking. All
connection check types are described below.
.INDENT 0.0
.TP
HTTP links (\fBhttp:\fP, \fBhttps:\fP)
After connecting to the given HTTP server the given path or query is
requested. All redirections are followed, and if user/password is
given it will be used as authorization when necessary. All final
HTTP status codes other than 2xx are errors.
.sp
HTML page contents are checked for recursion.
.TP
Local files (\fBfile:\fP)
A regular, readable file that can be opened is valid. A readable
directory is also valid. All other files, for example device files,
unreadable or non\-existing files are errors.
.sp
HTML or other parseable file contents are checked for recursion.
.TP
Mail links (\fBmailto:\fP)
A mailto: link eventually resolves to a list of email addresses.
If one address fails, the whole list will fail. For each mail
address we check the following things:
.INDENT 7.0
.IP 1. 3
Check the address syntax, both the parts before and after the
@ sign.
.IP 2. 3
Look up the MX DNS records. If we found no MX record, print an
error.
.IP 3. 3
Check if one of the mail hosts accept an SMTP connection. Check
hosts with higher priority first. If no host accepts SMTP, we
print a warning.
.IP 4. 3
Try to verify the address with the VRFY command. If we got an
answer, print the verified address as an info.
.UNINDENT
.TP
FTP links (\fBftp:\fP)
For FTP links we do:
.INDENT 7.0
.IP 1. 3
connect to the specified host
.IP 2. 3
try to login with the given user and password. The default user
is \fBanonymous\fP, the default password is \fBanonymous@\fP\&.
.IP 3. 3
try to change to the given directory
.IP 4. 3
list the file with the NLST command
.UNINDENT
.TP
Telnet links (\fBtelnet:\fP)
We try to connect and if user/password are given, login to the given
telnet server.
.TP
NNTP links (\fBnews:\fP, \fBsnews:\fP, \fBnntp\fP)
We try to connect to the given NNTP server. If a news group or
article is specified, try to request it from the server.
.TP
Unsupported links (\fBjavascript:\fP, etc.)
An unsupported link will only print a warning. No further checking
will be made.
.sp
The complete list of recognized, but unsupported links can be found
in the
\fI\%linkcheck/checker/unknownurl.py\fP
source file. The most prominent of them should be JavaScript links.
.UNINDENT
.SH PLUGINS
.sp
There are two plugin types: connection and content plugins. Connection
plugins are run after a successful connection to the URL host. Content
plugins are run if the URL type has content (mailto: URLs have no
content for example) and if the check is not forbidden (ie. by HTTP
robots.txt).
Use the option \fI\%\-\-list\-plugins\fP for a list of plugins and their
documentation. All plugins are enabled via the \fBlinkcheckerrc(5)\fP
configuration file.
.SH RECURSION
.sp
Before descending recursively into a URL, it has to fulfill several
conditions. They are checked in this order:
.INDENT 0.0
.IP 1. 3
A URL must be valid.
.IP 2. 3
A URL must be parseable. This currently includes HTML files, Opera
bookmarks files, and directories. If a file type cannot be determined
(for example it does not have a common HTML file extension, and the
content does not look like HTML), it is assumed to be non\-parseable.
.IP 3. 3
The URL content must be retrievable. This is usually the case except
for example mailto: or unknown URL types.
.IP 4. 3
The maximum recursion level must not be exceeded. It is configured
with the \fI\%\-\-recursion\-level\fP option and is unlimited per default.
.IP 5. 3
It must not match the ignored URL list. This is controlled with the
\fI\%\-\-ignore\-url\fP option.
.IP 6. 3
The Robots Exclusion Protocol must allow links in the URL to be
followed recursively. This is checked by searching for a "nofollow"
directive in the HTML header data.
.UNINDENT
.sp
Note that the directory recursion reads all files in that directory, not
just a subset like \fBindex.htm\fP\&.
.SH NOTES
.sp
URLs on the commandline starting with \fBftp.\fP are treated like
\fBftp://ftp.\fP, URLs starting with \fBwww.\fP are treated like
\fBhttp://www.\fP\&. You can also give local files as arguments.
If you have your system configured to automatically establish a
connection to the internet (e.g. with diald), it will connect when
checking links not pointing to your local host. Use the \fI\%\-\-ignore\-url\fP
option to prevent this.
.sp
Javascript links are not supported.
.sp
If your platform does not support threading, LinkChecker disables it
automatically.
.sp
You can supply multiple user/password pairs in a configuration file.
.sp
When checking \fBnews:\fP links the given NNTP host doesn\(aqt need to be the
same as the host of the user browsing your pages.
.SH ENVIRONMENT
.INDENT 0.0
.TP
.B NNTP_SERVER
specifies default NNTP server
.UNINDENT
.INDENT 0.0
.TP
.B http_proxy
specifies default HTTP proxy server
.UNINDENT
.INDENT 0.0
.TP
.B ftp_proxy
specifies default FTP proxy server
.UNINDENT
.INDENT 0.0
.TP
.B no_proxy
comma\-separated list of domains to not contact over a proxy server
.UNINDENT
.INDENT 0.0
.TP
.B LC_MESSAGES, LANG, LANGUAGE
specify output language
.UNINDENT
.SH RETURN VALUE
.sp
The return value is 2 when
.INDENT 0.0
.IP \(bu 2
a program error occurred.
.UNINDENT
.sp
The return value is 1 when
.INDENT 0.0
.IP \(bu 2
invalid links were found or
.IP \(bu 2
link warnings were found and warnings are enabled
.UNINDENT
.sp
Else the return value is zero.
.SH LIMITATIONS
.sp
LinkChecker consumes memory for each queued URL to check. With thousands
of queued URLs the amount of consumed memory can become quite large.
This might slow down the program or even the whole system.
.SH FILES
.sp
\fB~/.linkchecker/linkcheckerrc\fP \- default configuration file
.sp
\fB~/.linkchecker/blacklist\fP \- default blacklist logger output filename
.sp
\fBlinkchecker\-out.\fP\fITYPE\fP \- default logger file output name
.SH SEE ALSO
.sp
\fBlinkcheckerrc(5)\fP
.sp
\fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP \- valid
output encodings
.sp
\fI\%https://docs.python.org/howto/regex.html\fP \- regular expression
documentation
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>
.SH COPYRIGHT
2000-2014 Bastian Kleineidam
.\" Generated by docutils manpage writer.
.

667
doc/man/en/linkcheckerrc.5 Normal file
View file

@ -0,0 +1,667 @@
.\" Man page generated from reStructuredText.
.
.TH "LINKCHECKERRC" "5" "August 11, 2020" "" "LinkChecker"
.SH NAME
linkcheckerrc \- configuration file for LinkChecker
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH DESCRIPTION
.sp
\fBlinkcheckerrc\fP is the configuration file for LinkChecker. The file is
written in an INI\-style format.
The default file location is \fB~/.linkchecker/linkcheckerrc\fP on Unix,
\fB%HOMEPATH%\e.linkchecker\elinkcheckerrc\fP on Windows systems.
.SH SETTINGS
.SS checking
.INDENT 0.0
.TP
\fBcookiefile=\fP\fIfilename\fP
Read a file with initial cookie data. The cookie data format is
explained in \fBlinkchecker(1)\fP\&.
Command line option: \fB\-\-cookiefile\fP
.TP
\fBdebugmemory=\fP[\fB0\fP|\fB1\fP]
Write memory allocation statistics to a file on exit, requires \fI\%meliae\fP\&.
The default is not to write the file.
Command line option: none
.TP
\fBlocalwebroot=\fP\fISTRING\fP
When checking absolute URLs inside local files, the given root
directory is used as base URL.
Note that the given directory must have URL syntax, so it must use a
slash to join directories instead of a backslash. And the given
directory must end with a slash.
Command line option: none
.TP
\fBnntpserver=\fP\fISTRING\fP
Specify an NNTP server for \fBnews:\fP links. Default is the
environment variable \fBNNTP_SERVER\fP\&. If no host is given, only the
syntax of the link is checked.
Command line option: \fB\-\-nntp\-server\fP
.TP
\fBrecursionlevel=\fP\fINUMBER\fP
Check recursively all links up to given depth. A negative depth will
enable infinite recursion. Default depth is infinite.
Command line option: \fB\-\-recursion\-level\fP
.TP
\fBthreads=\fP\fINUMBER\fP
Generate no more than the given number of threads. Default number of
threads is 10. To disable threading specify a non\-positive number.
Command line option: \fB\-\-threads\fP
.TP
\fBtimeout=\fP\fINUMBER\fP
Set the timeout for connection attempts in seconds. The default
timeout is 60 seconds.
Command line option: \fB\-\-timeout\fP
.TP
\fBaborttimeout=\fP\fINUMBER\fP
Time to wait for checks to finish after the user aborts the first
time (with Ctrl\-C or the abort button). The default abort timeout is
300 seconds.
Command line option: \fB\-\-timeout\fP
.TP
\fBuseragent=\fP\fISTRING\fP
Specify the User\-Agent string to send to the HTTP server, for
example "Mozilla/4.0". The default is "LinkChecker/X.Y" where X.Y is
the current version of LinkChecker.
Command line option: \fB\-\-user\-agent\fP
.TP
\fBsslverify=\fP[\fB0\fP|\fB1\fP|\fIfilename\fP]
If set to zero disables SSL certificate checking. If set to one (the
default) enables SSL certificate checking with the provided CA
certificate file. If a filename is specified, it will be used as the
certificate file.
Command line option: none
.TP
\fBmaxrunseconds=\fP\fINUMBER\fP
Stop checking new URLs after the given number of seconds. Same as if
the user stops (by hitting Ctrl\-C) after the given number of
seconds.
The default is not to stop until all URLs are checked.
Command line option: none
.TP
\fBmaxfilesizedownload=\fP\fINUMBER\fP
Files larger than NUMBER bytes will be ignored, without downloading anything
if accessed over http and an accurate Content\-Length header was returned.
No more than this amount of a file will be downloaded.
The default is 5242880 (5 MB).
Command line option: none
.TP
\fBmaxfilesizeparse=\fP\fINUMBER\fP
Files larger than NUMBER bytes will not be parsed for links.
The default is 1048576 (1 MB).
Command line option: none
.TP
\fBmaxnumurls=\fP\fINUMBER\fP
Maximum number of URLs to check. New URLs will not be queued after
the given number of URLs is checked.
The default is to queue and check all URLs.
Command line option: none
.TP
\fBmaxrequestspersecond=\fP\fINUMBER\fP
Limit the maximum number of requests per second to one host.
The default is 10.
Command line option: none
.TP
\fBrobotstxt=\fP[\fB0\fP|\fB1\fP]
When using http, fetch robots.txt, and confirm whether each URL should
be accessed before checking.
The default is to use robots.txt files.
Command line option: \fB\-\-no\-robots\fP
.TP
\fBallowedschemes=\fP\fINAME\fP[\fB,\fP\fINAME\fP\&...]
Allowed URL schemes as comma\-separated list.
Command line option: none
.UNINDENT
.SS filtering
.INDENT 0.0
.TP
\fBignore=\fP\fIREGEX\fP (MULTILINE)
Only check syntax of URLs matching the given regular expressions.
Command line option: \fB\-\-ignore\-url\fP
.TP
\fBignorewarnings=\fP\fINAME\fP[\fB,\fP\fINAME\fP\&...]
Ignore the comma\-separated list of warnings. See \fI\%WARNINGS\fP for
the list of supported warnings.
Command line option: none
.TP
\fBinternlinks=\fP\fIREGEX\fP
Regular expression to add more URLs recognized as internal links.
Default is that URLs given on the command line are internal.
Command line option: none
.TP
\fBnofollow=\fP\fIREGEX\fP (MULTILINE)
Check but do not recurse into URLs matching the given regular
expressions.
Command line option: \fB\-\-no\-follow\-url\fP
.TP
\fBcheckextern=\fP[\fB0\fP|\fB1\fP]
Check external links. Default is to check internal links only.
Command line option: \fB\-\-check\-extern\fP
.UNINDENT
.SS authentication
.INDENT 0.0
.TP
\fBentry=\fP\fIREGEX\fP \fIUSER\fP [\fIPASS\fP] (MULTILINE)
Provide individual username/password pairs for different links. In
addtion to a single login page specified with \fBloginurl\fP multiple
FTP, HTTP (Basic Authentication) and telnet links are supported.
Entries are a triple (URL regex, username, password) or a tuple (URL
regex, username), where the entries are separated by whitespace.
The password is optional and if missing it has to be entered at the
commandline.
If the regular expression matches the checked URL, the given
username/password pair is used for authentication. The command line
options \fB\-u\fP and \fB\-p\fP match every link and therefore override
the entries given here. The first match wins.
Command line option: \fB\-u\fP, \fB\-p\fP
.TP
\fBloginurl=\fP\fIURL\fP
The URL of a login page to be visited before link checking. The page
is expected to contain an HTML form to collect credentials and
submit them to the address in its action attribute using an HTTP
POST request. The name attributes of the input elements of the form
and the values to be submitted need to be available (see \fBentry\fP
for an explanation of username and password values).
.TP
\fBloginuserfield=\fP\fISTRING\fP
The name attribute of the username input element. Default: \fBlogin\fP\&.
.TP
\fBloginpasswordfield=\fP\fISTRING\fP
The name attribute of the password input element. Default: \fBpassword\fP\&.
.TP
\fBloginextrafields=\fP\fINAME\fP\fB:\fP\fIVALUE\fP (MULTILINE)
Optionally the name attributes of any additional input elements and
the values to populate them with. Note that these are submitted
without checking whether matching input elements exist in the HTML
form.
.UNINDENT
.SS output
.INDENT 0.0
.TP
\fBdebug=\fP\fISTRING\fP[\fB,\fP\fISTRING\fP\&...]
Print debugging output for the given modules. Available debug
modules are \fBcmdline\fP, \fBchecking\fP, \fBcache\fP, \fBdns\fP,
\fBthread\fP, \fBplugins\fP and \fBall\fP\&. Specifying \fBall\fP is an alias
for specifying all available loggers.
Command line option: \fB\-\-debug\fP
.TP
\fBfileoutput=\fP\fITYPE\fP[\fB,\fP\fITYPE\fP\&...]
Output to a file \fBlinkchecker\-out.\fP\fITYPE\fP, or
\fB$HOME/.linkchecker/blacklist\fP for \fBblacklist\fP output.
Valid file output types are \fBtext\fP, \fBhtml\fP, \fBsql\fP, \fBcsv\fP,
\fBgml\fP, \fBdot\fP, \fBxml\fP, \fBnone\fP or \fBblacklist\fP\&. Default is no
file output. The various output types are documented below. Note
that you can suppress all console output with \fBoutput=none\fP\&.
Command line option: \fB\-\-file\-output\fP
.TP
\fBlog=\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP]
Specify output type as \fBtext\fP, \fBhtml\fP, \fBsql\fP, \fBcsv\fP,
\fBgml\fP, \fBdot\fP, \fBxml\fP, \fBnone\fP or \fBblacklist\fP\&. Default type
is \fBtext\fP\&. The various output types are documented below.
The \fIENCODING\fP specifies the output encoding, the default is that of
your locale. Valid encodings are listed at
\fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP\&.
Command line option: \fB\-\-output\fP
.TP
\fBquiet=\fP[\fB0\fP|\fB1\fP]
If set, operate quiet. An alias for \fBlog=none\fP\&. This is only
useful with \fBfileoutput\fP\&.
Command line option: \fB\-\-verbose\fP
.TP
\fBstatus=\fP[\fB0\fP|\fB1\fP]
Control printing check status messages. Default is 1.
Command line option: \fB\-\-no\-status\fP
.TP
\fBverbose=\fP[\fB0\fP|\fB1\fP]
If set log all checked URLs once. Default is to log only errors and
warnings.
Command line option: \fB\-\-verbose\fP
.TP
\fBwarnings=\fP[\fB0\fP|\fB1\fP]
If set log warnings. Default is to log warnings.
Command line option: \fB\-\-no\-warnings\fP
.UNINDENT
.SS text
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
Specify output filename for text logging. Default filename is
\fBlinkchecker\-out.txt\fP\&.
Command line option: \fB\-\-file\-output\fP
.TP
\fBparts=\fP\fISTRING\fP
Comma\-separated list of parts that have to be logged. See \fI\%LOGGER PARTS\fP
below.
Command line option: none
.TP
\fBencoding=\fP\fISTRING\fP
Valid encodings are listed in
\fI\%https://docs.python.org/library/codecs.html#standard\-encodings\fP\&.
Default encoding is \fBiso\-8859\-15\fP\&.
.TP
.B \fIcolor*\fP
Color settings for the various log parts, syntax is \fIcolor\fP or
\fItype\fP\fB;\fP\fIcolor\fP\&. The \fItype\fP can be \fBbold\fP, \fBlight\fP,
\fBblink\fP, \fBinvert\fP\&. The \fIcolor\fP can be \fBdefault\fP, \fBblack\fP,
\fBred\fP, \fBgreen\fP, \fByellow\fP, \fBblue\fP, \fBpurple\fP, \fBcyan\fP,
\fBwhite\fP, \fBBlack\fP, \fBRed\fP, \fBGreen\fP, \fBYellow\fP, \fBBlue\fP,
\fBPurple\fP, \fBCyan\fP or \fBWhite\fP\&.
Command line option: none
.TP
\fBcolorparent=\fP\fISTRING\fP
Set parent color. Default is \fBwhite\fP\&.
.TP
\fBcolorurl=\fP\fISTRING\fP
Set URL color. Default is \fBdefault\fP\&.
.TP
\fBcolorname=\fP\fISTRING\fP
Set name color. Default is \fBdefault\fP\&.
.TP
\fBcolorreal=\fP\fISTRING\fP
Set real URL color. Default is \fBcyan\fP\&.
.TP
\fBcolorbase=\fP\fISTRING\fP
Set base URL color. Default is \fBpurple\fP\&.
.TP
\fBcolorvalid=\fP\fISTRING\fP
Set valid color. Default is \fBbold;green\fP\&.
.TP
\fBcolorinvalid=\fP\fISTRING\fP
Set invalid color. Default is \fBbold;red\fP\&.
.TP
\fBcolorinfo=\fP\fISTRING\fP
Set info color. Default is \fBdefault\fP\&.
.TP
\fBcolorwarning=\fP\fISTRING\fP
Set warning color. Default is \fBbold;yellow\fP\&.
.TP
\fBcolordltime=\fP\fISTRING\fP
Set download time color. Default is \fBdefault\fP\&.
.TP
\fBcolorreset=\fP\fISTRING\fP
Set reset color. Default is \fBdefault\fP\&.
.UNINDENT
.SS gml
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBparts=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.UNINDENT
.SS dot
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBparts=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.UNINDENT
.SS csv
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBparts=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBseparator=\fP\fICHAR\fP
Set CSV separator. Default is a comma (\fB,\fP).
.TP
\fBquotechar=\fP\fICHAR\fP
Set CSV quote character. Default is a double quote (\fB"\fP).
.UNINDENT
.SS sql
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBparts=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBdbname=\fP\fISTRING\fP
Set database name to store into. Default is \fBlinksdb\fP\&.
.TP
\fBseparator=\fP\fICHAR\fP
Set SQL command separator character. Default is a semicolon (\fB;\fP).
.UNINDENT
.SS html
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBparts=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBcolorbackground=\fP\fICOLOR\fP
Set HTML background color. Default is \fB#fff7e5\fP\&.
.TP
\fBcolorurl=\fP
Set HTML URL color. Default is \fB#dcd5cf\fP\&.
.TP
\fBcolorborder=\fP
Set HTML border color. Default is \fB#000000\fP\&.
.TP
\fBcolorlink=\fP
Set HTML link color. Default is \fB#191c83\fP\&.
.TP
\fBcolorwarning=\fP
Set HTML warning color. Default is \fB#e0954e\fP\&.
.TP
\fBcolorerror=\fP
Set HTML error color. Default is \fB#db4930\fP\&.
.TP
\fBcolorok=\fP
Set HTML valid color. Default is \fB#3ba557\fP\&.
.UNINDENT
.SS blacklist
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.UNINDENT
.SS xml
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBparts=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.UNINDENT
.SS gxml
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBparts=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.UNINDENT
.SS sitemap
.INDENT 0.0
.TP
\fBfilename=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBparts=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBencoding=\fP\fISTRING\fP
See \fI\%[text]\fP section above.
.TP
\fBpriority=\fP\fIFLOAT\fP
A number between 0.0 and 1.0 determining the priority. The default
priority for the first URL is 1.0, for all child URLs 0.5.
.TP
\fBfrequency=\fP[\fBalways\fP|\fBhourly\fP|\fBdaily\fP|\fBweekly\fP|\fBmonthly\fP|\fByearly\fP|\fBnever\fP]
How frequently pages are changing.
.UNINDENT
.SH LOGGER PARTS
.INDENT 0.0
.TP
\fBall\fP
for all parts
.TP
\fBid\fP
a unique ID for each logentry
.TP
\fBrealurl\fP
the full url link
.TP
\fBresult\fP
valid or invalid, with messages
.TP
\fBextern\fP
1 or 0, only in some logger types reported
.TP
\fBbase\fP
base href=...
.TP
\fBname\fP
<a href=...>name</a> and <img alt="name">
.TP
\fBparenturl\fP
if any
.TP
\fBinfo\fP
some additional info, e.g. FTP welcome messages
.TP
\fBwarning\fP
warnings
.TP
\fBdltime\fP
download time
.TP
\fBchecktime\fP
check time
.TP
\fBurl\fP
the original url name, can be relative
.TP
\fBintro\fP
the blurb at the beginning, "starting at ..."
.TP
\fBoutro\fP
the blurb at the end, "found x errors ..."
.UNINDENT
.SH MULTILINE
.sp
Some option values can span multiple lines. Each line has to be indented
for that to work. Lines starting with a hash (\fB#\fP) will be ignored,
though they must still be indented.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
ignore=
lconline
bookmark
# a comment
^mailto:
.ft P
.fi
.UNINDENT
.UNINDENT
.SH EXAMPLE
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
[output]
log=html
[checking]
threads=5
[filtering]
ignorewarnings=http\-moved\-permanent
.ft P
.fi
.UNINDENT
.UNINDENT
.SH PLUGINS
.sp
All plugins have a separate section. If the section appears in the
configuration file the plugin is enabled. Some plugins read extra
options in their section.
.SS AnchorCheck
.sp
Checks validity of HTML anchors.
.SS LocationInfo
.sp
Adds the country and if possible city name of the URL host as info.
Needs GeoIP or pygeoip and a local country or city lookup DB installed.
.SS RegexCheck
.sp
Define a regular expression which prints a warning if it matches any
content of the checked link. This applies only to valid pages, so we can
get their content.
.INDENT 0.0
.TP
\fBwarningregex=\fP\fIREGEX\fP
Use this to check for pages that contain some form of error message,
for example "This page has moved" or "Oracle Application error".
\fIREGEX\fP should be unquoted.
.sp
Note that multiple values can be combined in the regular expression,
for example "(This page has moved|Oracle Application error)".
.UNINDENT
.SS SslCertificateCheck
.sp
Check SSL certificate expiration date. Only internal https: links will
be checked. A domain will only be checked once to avoid duplicate
warnings.
.INDENT 0.0
.TP
\fBsslcertwarndays=\fP\fINUMBER\fP
Configures the expiration warning time in days.
.UNINDENT
.SS HtmlSyntaxCheck
.sp
Check the syntax of HTML pages with the online W3C HTML validator. See
\fI\%https://validator.w3.org/docs/api.html\fP\&.
.SS HttpHeaderInfo
.sp
Print HTTP headers in URL info.
.INDENT 0.0
.TP
\fBprefixes=\fP\fIprefix1\fP[,*prefix2*]...
List of comma separated header prefixes. For example to display all
HTTP headers that start with "X\-".
.UNINDENT
.SS CssSyntaxCheck
.sp
Check the syntax of HTML pages with the online W3C CSS validator. See
\fI\%https://jigsaw.w3.org/css\-validator/manual.html#expert\fP\&.
.SS VirusCheck
.sp
Checks the page content for virus infections with clamav. A local clamav
daemon must be installed.
.INDENT 0.0
.TP
\fBclamavconf=\fP\fIfilename\fP
Filename of \fBclamd.conf\fP config file.
.UNINDENT
.SS PdfParser
.sp
Parse PDF files for URLs to check. Needs the \fI\%pdfminer\fP Python package
installed.
.SS WordParser
.sp
Parse Word files for URLs to check. Needs the \fI\%pywin32\fP Python
extension installed.
.SH WARNINGS
.sp
The following warnings are recognized in the \(aqignorewarnings\(aq config
file entry:
.INDENT 0.0
.TP
\fBfile\-missing\-slash\fP
The file: URL is missing a trailing slash.
.TP
\fBfile\-system\-path\fP
The file: path is not the same as the system specific path.
.TP
\fBftp\-missing\-slash\fP
The ftp: URL is missing a trailing slash.
.TP
\fBhttp\-cookie\-store\-error\fP
An error occurred while storing a cookie.
.TP
\fBhttp\-empty\-content\fP
The URL had no content.
.TP
\fBmail\-no\-mx\-host\fP
The mail MX host could not be found.
.TP
\fBnntp\-no\-newsgroup\fP
The NNTP newsgroup could not be found.
.TP
\fBnntp\-no\-server\fP
No NNTP server was found.
.TP
\fBurl\-content\-size\-zero\fP
The URL content size is zero.
.TP
\fBurl\-content\-too\-large\fP
The URL content size is too large.
.TP
\fBurl\-effective\-url\fP
The effective URL is different from the original.
.TP
\fBurl\-error\-getting\-content\fP
Could not get the content of the URL.
.TP
\fBurl\-obfuscated\-ip\fP
The IP is obfuscated.
.TP
\fBurl\-whitespace\fP
The URL contains leading or trailing whitespace.
.UNINDENT
.SH SEE ALSO
.sp
\fBlinkchecker(1)\fP
.SH AUTHOR
Bastian Kleineidam <bastian.kleineidam@web.de>
.SH COPYRIGHT
2000-2014 Bastian Kleineidam
.\" Generated by docutils manpage writer.
.

View file

@ -1,4 +0,0 @@
[po4a_langs] de
[po4a_paths] linkchecker.doc.pot $lang:$lang.po
[type: man] en/linkchecker.1 $lang:$lang/linkchecker.1
[type: man] en/linkcheckerrc.5 $lang:$lang/linkcheckerrc.5

31
doc/src/Makefile Normal file
View file

@ -0,0 +1,31 @@
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SPHINXINTL ?= sphinx-intl
SOURCEDIR = .
BUILDDIR = _build
LANGUAGE = en
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
gettext:
@$(SPHINXBUILD) -b gettext "$(SOURCEDIR)" -d "$(BUILDDIR)/i18n/doctrees" ../i18n/gettext $(SPHINXOPTS) $(O)
html:
@$(SPHINXBUILD) -b html "$(SOURCEDIR)" -d "$(BUILDDIR)/doctrees" ../html $(SPHINXOPTS) $(O)
man:
@$(SPHINXBUILD) -b man "$(SOURCEDIR)" -d "$(BUILDDIR)/doctrees" ../man/$(LANGUAGE) $(SPHINXOPTS) $(O)
locale: gettext
@$(SPHINXINTL) update -p ../i18n/gettext -l de
.PHONY: help gettext html locale man Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

View file

@ -0,0 +1,9 @@
{% extends "!layout.html" %}
{% block menu %}
{{ super() }}
<a href="{{ pathto('genindex') }}">Index</a>
<hr>
<a href="https://github.com/linkchecker/linkchecker/blob/master/doc/changelog.txt">Change Log</a>
<a href="https://github.com/linkchecker/linkchecker/issues/">Issue Tracker</a>
{% endblock %}

99
doc/src/code/index.rst Normal file
View file

@ -0,0 +1,99 @@
:github_url: https://github.com/linkchecker/linkchecker/blob/master/doc/src/code/install.rst
Code
====
LinkChecker comprises the linkchecker executable and linkcheck package.
.. autosummary::
:recursive:
:toctree: linkcheck
linkcheck
.. rubric:: Running
linkchecker provides the command-line arguments and reads a list of URLs from
standard input, reads configuration files, drops privileges if run as root,
initialises the chosen logger and collects an optional password.
Uses :meth:`linkcheck.director.get_aggregate` to obtain an *aggregate* object
:class:`linkcheck.director.aggregator.Aggregate`
that includes :class:`linkcheck.cache.urlqueue.UrlQueue`,
:class:`linkcheck.plugins.PluginManager` and
:class:`linkcheck.cache.results.ResultCache` objects.
Adds URLs in the form of *url_data* objects to the aggregate's *urlqueue* with
:meth:`linkcheck.cmdline.aggregate_url` which uses
:meth:`linkcheck.checker.get_url_from` to return a *url_data* object that is an instance
of one of the :mod:`linkcheck.checker` classes derived from :class:`linkcheck.checker.urlbase.UrlBase`,
according to the URL scheme.
.. graphviz::
:alt: linkcheck.checker classes
digraph "linkcheck.checker classes" {
charset="utf-8"
rankdir=BT
"1" [label="DnsUrl", shape="record", href="../code/linkcheck/linkcheck.checker.dnsurl.html", target="_blank"];
"2" [label="FileUrl", shape="record", href="../code/linkcheck/linkcheck.checker.fileurl.html", target="_blank"];
"3" [label="FtpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.ftpurl.html", target="_blank"];
"4" [label="HttpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.httpurl.html", target="_blank"];
"5" [label="IgnoreUrl", shape="record", href="../code/linkcheck/linkcheck.checker.ignoreurl.html", target="_blank"];
"6" [label="InternPatternUrl", shape="record", href="../code/linkcheck/linkcheck.checker.internpaturl.html", target="_blank"];
"7" [label="ItmsServicesUrl", shape="record", href="../code/linkcheck/linkcheck.checker.itmsservicesurl.html", target="_blank"];
"8" [label="MailtoUrl", shape="record", href="../code/linkcheck/linkcheck.checker.mailtourl.html", target="_blank"];
"9" [label="NntpUrl", shape="record", href="../code/linkcheck/linkcheck.checker.nntpurl.html", target="_blank"];
"10" [label="ProxySupport", shape="record", href="../code/linkcheck/linkcheck.checker.proxysupport.html", target="_blank"];
"11" [label="TelnetUrl", shape="record", href="../code/linkcheck/linkcheck.checker.telneturl.html", target="_blank"];
"12" [label="UnknownUrl", shape="record", href="../code/linkcheck/linkcheck.checker.unknownurl.html", target="_blank"];
"13" [label="UrlBase", shape="record", href="../code/linkcheck/linkcheck.checker.urlbase.html", target="_blank"];
"1" -> "13" [arrowhead="empty", arrowtail="none"];
"2" -> "13" [arrowhead="empty", arrowtail="none"];
"3" -> "6" [arrowhead="empty", arrowtail="none"];
"3" -> "10" [arrowhead="empty", arrowtail="none"];
"4" -> "6" [arrowhead="empty", arrowtail="none"];
"4" -> "10" [arrowhead="empty", arrowtail="none"];
"5" -> "12" [arrowhead="empty", arrowtail="none"];
"6" -> "13" [arrowhead="empty", arrowtail="none"];
"7" -> "13" [arrowhead="empty", arrowtail="none"];
"8" -> "13" [arrowhead="empty", arrowtail="none"];
"9" -> "13" [arrowhead="empty", arrowtail="none"];
"11" -> "13" [arrowhead="empty", arrowtail="none"];
"12" -> "13" [arrowhead="empty", arrowtail="none"];
}
Optionally initialises profiling.
Starts the checking with :meth:`linkcheck.director.check_urls`, passing the *aggregate*.
Finally it counts any errors and exits with the appropriate code.
.. rubric:: Checking & Parsing
That is:
- Checking a link is valid
- Parsing the document the link points to for new links
:meth:`linkcheck.director.check_urls` authenticates with a login form if one is configured
via :meth:`linkcheck.director.aggregator.Aggregate.visit_loginurl`, starts logging
with :meth:`linkcheck.director.aggregator.Aggregate.logger.start_log_output`
and calls :meth:`linkcheck.director.aggregator.Aggregate.start_threads` which instantiates a
:class:`linkcheck.director.checker.Checker` object with the urlqueue if there is at
least one thread configured, else it calls
:meth:`linkcheck.director.checker.check_urls` which loops through the entries in the *urlqueue*.
Either way :meth:`linkcheck.director.checker.check_url` tests to see if *url_data* already has a result and
whether the cache already has a result for that key.
If not it calls *url_data.check()*,
which calls *url_data.check_content()* that runs content plugins and returns *do_parse*
according to *url_data.do_check_content* and :meth:`linkcheck.checker.urlbase.UrlBase.allows_recursion` which
includes :meth:`linkcheck.checker.urlbase.UrlBase.allows_simple_recursion` that is monitoring the recursion level
(with :attr:`linkcheck.checker.urlbase.UrlBase.recursion_level`).
If *do_parse* is True, passes the *url_data* object to :meth:`linkcheck.parser.parse_url` to call a
`linkcheck.parser.parse_` method according to the document type
e.g. :meth:`linkcheck.parser.parse_html` for HTML which calls :meth:`linkcheck.htmlutil.linkparse.find_links`
passing *url_data.get_soup()* and *url_data.add_url*.
`url_data.add_url` puts the new *url_data* object on the *urlqueue*.

View file

@ -0,0 +1 @@
.. include:: ../../CODE_OF_CONDUCT.rst

85
doc/src/conf.py Normal file
View file

@ -0,0 +1,85 @@
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))
# -- Project information -----------------------------------------------------
project = 'LinkChecker'
copyright = '2000-2014 Bastian Kleineidam'
# version = '10'
# -- General configuration ---------------------------------------------------
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosectionlabel',
'sphinx.ext.autosummary',
'sphinx.ext.extlinks',
'sphinx.ext.graphviz',
'sphinx.ext.viewcode',
'sphinx_epytext',
'sphinx_rtd_theme',
]
locale_dirs = ['../i18n/locales']
templates_path = ['_templates']
today_fmt = '%B %d, %Y'
# -- Options for HTML output -------------------------------------------------
html_favicon = 'images/favicon.ico'
html_logo = 'images/logo128x128.png'
html_theme = 'sphinx_rtd_theme'
html_theme_options = {
'collapse_navigation': False
}
# only use :manpage: within man pages
manpages_url = '{page}.html'
# -- Options for man output -------------------------------------------------
man_pages = [
(
'man/linkchecker', 'linkchecker',
'Kommandozeilenprogramm zum Prüfen von HTML Dokumenten und '
'Webseiten auf ungültige Verknüpfungen'
if tags.has('de') else
'command line client to check HTML documents and websites for broken links',
['Bastian Kleineidam <bastian.kleineidam@web.de>'], 1),
(
'man/linkcheckerrc', 'linkcheckerrc',
'Konfigurationsdatei für LinkChecker'
if tags.has('de') else
'configuration file for LinkChecker',
['Bastian Kleineidam <bastian.kleineidam@web.de>'], 5),
]
# -- Extension configuration -------------------------------------------------
autoclass_content = 'both'
autodoc_default_options = {
'members': True,
'undoc-members': True,
'show-inheritance': True,
}
autodoc_member_order = 'groupwise'
autosectionlabel_prefix_document = True
extlinks = {'pypi': ('https://pypi.org/project/%s/', '')}
graphviz_output_format = 'svg'
# -- Mock --------------------------------------------------------------------
import linkcheck.logger
linkcheck.logger.blacklist.BlacklistLogger.LoggerArgs = {
'filename': '~/.linkchecker/blacklist'}

1
doc/src/contributing.rst Normal file
View file

@ -0,0 +1 @@
.. include:: ../../CONTRIBUTING.rst

View file

@ -1,41 +1,48 @@
title: "Frequently asked questions"
---
**Q: LinkChecker produced an error, but my web page is ok with
:github_url: https://github.com/linkchecker/linkchecker/blob/master/doc/src/faq.rst
Frequently Asked Questions
==========================
**Q: LinkChecker produced an error, but my web page is okay with
Mozilla/IE/Opera/... Is this a bug in LinkChecker?**
A: Please check your web pages first. Are they really ok?
A: Please check your web pages first. Are they really okay?
Often the major browsers are very forgiving and good at handling HTML
of HTTP errors, while LinkChecker complains in most cases of invalid
content.
Enable the HtmlSyntaxCheck plugin, or check if you are using a proxy
which produces the error.
Enable the :ref:`man/linkcheckerrc:HtmlSyntaxCheck` plugin,
or check if you are using a proxy which produces the error.
**Q: I still get an error, but the page is definitely ok.**
**Q: I still get an error, but the page is definitely okay.**
A: Some servers deny access of automated tools (also called robots)
like LinkChecker. This is not a bug in LinkChecker but rather a
policy by the webmaster running the website you are checking. Look in
the ``/robots.txt`` file which follows the
[robots.txt exclusion standard](http://www.robotstxt.org/robotstxt.html).
`robots.txt exclusion standard <http://www.robotstxt.org/robotstxt.html>`_.
For identification LinkChecker adds to each request a User-Agent header
like this:
like this::
Mozilla/5.0 (compatible; LinkChecker/9.4; +https://linkchecker.github.io/linkchecker/)
If you yourself are the webmaster, consider allowing LinkChecker to
check your web pages by adding the following to your robots.txt file:
check your web pages by adding the following to your robots.txt file::
User-Agent: LinkChecker
Allow: /
**Q: How can I tell LinkChecker which proxy to use?**
A: LinkChecker works automatically with proxies. In a Unix or Windows
environment, set the http_proxy, https_proxy, ftp_proxy environment
variables to a URL that identifies the proxy server before starting
LinkChecker. For example
LinkChecker. For example:
.. code-block:: console
$ http_proxy="http://www.example.com:3128"
$ export http_proxy
@ -52,7 +59,7 @@ Unfortunately browsers like IE and Netscape do not enforce this.
**Q: Has LinkChecker JavaScript support?**
A: No, it never will. If your page is only working with JS, it is
better to use a browser testing tool like [Selenium](http://seleniumhq.org/).
better to use a browser testing tool like `Selenium <http://seleniumhq.org/>`_.
**Q: Is the LinkCheckers cookie feature insecure?**
@ -64,7 +71,7 @@ hosts.
Also, the following restrictions apply for cookies that LinkChecker
receives from the hosts it check:
- Cookies will only be sent back to the originating server (ie. no
- Cookies will only be sent back to the originating server (i.e. no
third party cookies are allowed).
- Cookies are only stored in memory. After LinkChecker finishes, they
are lost.
@ -75,14 +82,14 @@ receives from the hosts it check:
checks. What is that about?**
A: LinkChecker follows the
[robots.txt exclusion standard](http://www.robotstxt.org/robotstxt.html).
`robots.txt exclusion standard <http://www.robotstxt.org/robotstxt.html>`_.
To avoid misuse of LinkChecker, you cannot turn this feature off.
See the [Web Robot pages](http://www.robotstxt.org/robotstxt.html) and the
[Spidering report](http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt)
See the `Web Robot pages <http://www.robotstxt.org/robotstxt.html>`_ and the
`Spidering report <http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt>`_
for more info.
If you yourself are the webmaster, consider allowing LinkChecker to
check your web pages by adding the following to your robots.txt file:
check your web pages by adding the following to your robots.txt file::
User-Agent: LinkChecker
Allow: /
@ -97,27 +104,28 @@ repository and access to your web server configuration.
**Q: How do I check HTML/XML/CSS syntax with LinkChecker?**
A: Enable the HtmlSyntaxCheck and CssSyntaxCheck plugins.
A: Enable the :ref:`man/linkcheckerrc:HtmlSyntaxCheck` and
:ref:`man/linkcheckerrc:CssSyntaxCheck` plugins.
**Q: I want to have my own logging class. How can I use it in LinkChecker?**
A: A Python API lets you define new logging classes.
Define your own logging class as a subclass of _Logger or any other
logging class in the log module.
Then call the add_logger function in Config.Configuration to register
Define your own logging class as a subclass of *_Logger* or any other
logging class in the *log* module.
Then call the *add_logger* function in *Config.Configuration* to register
your new Logger.
After this append a new Logging instance to the fileoutput.
```python
import linkcheck
class MyLogger(linkcheck.logger._Logger):
LoggerName = 'mylog'
LoggerArgs = {'fileoutput': log_format, 'filename': 'foo.txt'}
.. code-block:: python
# ...
import linkcheck
class MyLogger(linkcheck.logger._Logger):
LoggerName = 'mylog'
LoggerArgs = {'fileoutput': log_format, 'filename': 'foo.txt'}
cfg = linkcheck.configuration.Configuration()
cfg.logger_add(MyLogger)
cfg['fileoutput'].append(cfg.logger_new(MyLogger.LoggerName))
```
# ...
cfg = linkcheck.configuration.Configuration()
cfg.logger_add(MyLogger)
cfg['fileoutput'].append(cfg.logger_new(MyLogger.LoggerName))

View file

Before

Width:  |  Height:  |  Size: 3.6 KiB

After

Width:  |  Height:  |  Size: 3.6 KiB

View file

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 26 KiB

View file

Before

Width:  |  Height:  |  Size: 12 KiB

After

Width:  |  Height:  |  Size: 12 KiB

View file

Before

Width:  |  Height:  |  Size: 41 KiB

After

Width:  |  Height:  |  Size: 41 KiB

94
doc/src/index.rst Normal file
View file

@ -0,0 +1,94 @@
:github_url: https://github.com/linkchecker/linkchecker/blob/master/doc/src/install.rst
.. title:: LinkChecker
Check websites for broken links
===============================
Introduction
-------------
LinkChecker is a free, `GPL <http://www.gnu.org/licenses/gpl-2.0.html>`_
licensed website validator.
LinkChecker checks links in web documents or full websites.
It runs on Python 3 systems, requiring Python 3.5 or later.
Visit the project on `GitHub <https://github.com/linkchecker/linkchecker>`_.
Installation
------------
.. code-block:: console
$ pip3 install git+https://github.com/linkchecker/linkchecker.git
See the :doc:`installation document <install>` for more information.
Basic usage
------------
To check a URL like *http://www.example.org/myhomepage/* it is enough to
execute:
.. code-block:: console
$ linkchecker http://www.example.org/myhomepage/
This check will validate recursively all pages starting with
*http://www.example.org/myhomepage/*. Additionally, all external links
pointing outside of *www.example.org* will be checked but not recursed
into.
Features
---------
- recursive and multithreaded checking and site crawling
- output in colored or normal text, HTML, SQL, CSV, XML or a sitemap
graph in different formats
- HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file
links support
- restriction of link checking with regular expression filters for URLs
- proxy support
- username/password authorization for HTTP and FTP and Telnet
- honors robots.txt exclusion protocol
- Cookie support
- HTML5 support
- :ref:`Plugin support <man/linkchecker:PLUGINS>` allowing custom page checks. Currently available are
HTML and CSS syntax checks, Antivirus checks, and more.
- Different interfaces: command line and web interface
- ... and a lot more check options documented in the
:doc:`man/linkchecker` manual page.
Screenshots
------------
.. list-table::
* - .. image:: images/shot1.png
:scale: 20%
- .. image:: images/shot3.png
:scale: 20%
* - Commandline interface
- WSGI web interface
Test suite status
------------------
Linkchecker has extensive unit tests to ensure code quality.
`Travis CI <https://travis-ci.com/>`_ is used for continuous build
and test integration.
.. image:: https://travis-ci.com/linkchecker/linkchecker.png
:alt: Build Status
:target: https://travis-ci.com/linkchecker/linkchecker
.. toctree::
:hidden:
faq
install
upgrading
man/linkchecker
man/linkcheckerrc
contributing
code_of_conduct
code/index

7
doc/src/install.rst Normal file
View file

@ -0,0 +1,7 @@
Installation
============
If you are upgrading from older versions of LinkChecker you should
also read the upgrading documentation stored in :doc:`upgrading`.
.. include:: ../install.txt
:start-line: 5

539
doc/src/man/linkchecker.rst Normal file
View file

@ -0,0 +1,539 @@
:github_url: https://github.com/linkchecker/linkchecker/blob/master/doc/src/linkchecker.rst
linkchecker
===========
SYNOPSIS
--------
**linkchecker** [*options*] [*file-or-url*]...
DESCRIPTION
-----------
LinkChecker features
- recursive and multithreaded checking
- output in colored or normal text, HTML, SQL, CSV, XML or a sitemap
graph in different formats
- support for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and
local file links
- restriction of link checking with URL filters
- proxy support
- username/password authorization for HTTP, FTP and Telnet
- support for robots.txt exclusion protocol
- support for Cookies
- support for HTML5
- HTML and CSS syntax check
- Antivirus check
- a command line and web interface
EXAMPLES
--------
The most common use checks the given domain recursively:
.. code-block:: console
$ linkchecker http://www.example.com/
Beware that this checks the whole site which can have thousands of
URLs. Use the :option:`-r` option to restrict the recursion depth.
Don't check URLs with **/secret** in its name. All other links are
checked as usual:
.. code-block:: console
$ linkchecker --ignore-url=/secret mysite.example.com
Checking a local HTML file on Unix:
.. code-block:: console
$ linkchecker ../bla.html
Checking a local HTML file on Windows:
.. code-block:: doscon
C:\> linkchecker c:empest.html
You can skip the **http://** url part if the domain starts with
**www.**:
.. code-block:: console
$ linkchecker www.example.com
You can skip the **ftp://** url part if the domain starts with **ftp.**:
.. code-block:: console
$ linkchecker -r0 ftp.example.com
Generate a sitemap graph and convert it with the graphviz dot utility:
.. code-block:: console
$ linkchecker -odot -v www.example.com | dot -Tps > sitemap.ps
OPTIONS
-------
General options
^^^^^^^^^^^^^^^
.. option:: -f FILENAME, --config=FILENAME
Use FILENAME as configuration file. By default LinkChecker uses
~/.linkchecker/linkcheckerrc.
.. option:: -h, --help
Help me! Print usage information for this program.
.. option:: --stdin
Read list of white-space separated URLs to check from stdin.
.. option:: -t NUMBER, --threads=NUMBER
Generate no more than the given number of threads. Default number of
threads is 10. To disable threading specify a non-positive number.
.. option:: -V, --version
Print version and exit.
.. option:: --list-plugins
Print available check plugins and exit.
Output options
^^^^^^^^^^^^^^
.. option:: -D STRING, --debug=STRING
Print debugging output for the given logger. Available loggers are
cmdline, checking, cache, dns, plugin and
all. Specifying all is an alias for specifying all available
loggers. The option can be given multiple times to debug with more
than one logger. For accurate results, threading will be disabled
during debug runs.
.. option:: -F TYPE[/ENCODING][/FILENAME], --file-output=TYPE[/ENCODING][/FILENAME]
Output to a file linkchecker-out.TYPE,
$HOME/.linkchecker/blacklist for blacklist output, or
FILENAME if specified. The ENCODING specifies the output
encoding, the default is that of your locale. Valid encodings are
listed at
https://docs.python.org/library/codecs.html#standard-encodings.
The FILENAME and ENCODING parts of the none output type will
be ignored, else if the file already exists, it will be overwritten.
You can specify this option more than once. Valid file output TYPEs
are text, html, sql, csv, gml, dot, xml,
sitemap, none or blacklist. Default is no file output.
The various output types are documented below. Note that you can
suppress all console output with the option :option:`-o` *none*.
.. option:: --no-status
Do not print check status messages.
.. option:: --no-warnings
Don't log warnings. Default is to log warnings.
.. option:: -o TYPE[/ENCODING], --output=TYPE[/ENCODING]
Specify output type as text, html, sql, csv,
gml, dot, xml, sitemap, none or blacklist.
Default type is text. The various output types are documented
below.
The ENCODING specifies the output encoding, the default is that of
your locale. Valid encodings are listed at
https://docs.python.org/library/codecs.html#standard-encodings.
.. option:: -q, --quiet
Quiet operation, an alias for :option:`-o` *none*. This is only useful with
:option:`-F`.
.. option:: -v, --verbose
Log all checked URLs. Default is to log only errors and warnings.
.. option:: -W REGEX, --warning-regex=REGEX
Define a regular expression which prints a warning if it matches any
content of the checked link. This applies only to valid pages, so we
can get their content.
Use this to check for pages that contain some form of error, for
example "This page has moved" or "Oracle Application error".
Note that multiple values can be combined in the regular expression,
for example "(This page has moved|Oracle Application error)".
See section `REGULAR EXPRESSIONS`_ for more info.
Checking options
^^^^^^^^^^^^^^^^
.. option:: --cookiefile=FILENAME
Read a file with initial cookie data. The cookie data format is
explained below.
.. option:: --check-extern
Check external URLs.
.. option:: --ignore-url=REGEX
URLs matching the given regular expression will only be syntax checked.
This option can be given multiple times.
See section `REGULAR EXPRESSIONS`_ for more info.
.. option:: -N STRING, --nntp-server=STRING
Specify an NNTP server for news: links. Default is the
environment variable :envvar:`NNTP_SERVER`. If no host is given, only the
syntax of the link is checked.
.. option:: --no-follow-url=REGEX
Check but do not recurse into URLs matching the given regular
expression.
This option can be given multiple times.
See section `REGULAR EXPRESSIONS`_ for more info.
.. option:: --no-robots
Check URLs regardless of any robots.txt files.
.. option:: -p, --password
Read a password from console and use it for HTTP and FTP
authorization. For FTP the default password is anonymous@. For
HTTP there is no default password. See also :option:`-u`.
.. option:: -r NUMBER, --recursion-level=NUMBER
Check recursively all links up to given depth. A negative depth will
enable infinite recursion. Default depth is infinite.
.. option:: --timeout=NUMBER
Set the timeout for connection attempts in seconds. The default
timeout is 60 seconds.
.. option:: -u STRING, --user=STRING
Try the given username for HTTP and FTP authorization. For FTP the
default username is anonymous. For HTTP there is no default
username. See also :option:`-p`.
.. option:: --user-agent=STRING
Specify the User-Agent string to send to the HTTP server, for
example "Mozilla/4.0". The default is "LinkChecker/X.Y" where X.Y is
the current version of LinkChecker.
CONFIGURATION FILES
-------------------
Configuration files can specify all options above. They can also specify
some options that cannot be set on the command line. See
:manpage:`linkcheckerrc(5)` for more info.
OUTPUT TYPES
------------
Note that by default only errors and warnings are logged. You should use
the option :option:`--verbose` to get the complete URL list, especially when
outputting a sitemap graph format.
**text**
Standard text logger, logging URLs in keyword: argument fashion.
**html**
Log URLs in keyword: argument fashion, formatted as HTML.
Additionally has links to the referenced pages. Invalid URLs have
HTML and CSS syntax check links appended.
**csv**
Log check result in CSV format with one URL per line.
**gml**
Log parent-child relations between linked URLs as a GML sitemap
graph.
**dot**
Log parent-child relations between linked URLs as a DOT sitemap
graph.
**gxml**
Log check result as a GraphXML sitemap graph.
**xml**
Log check result as machine-readable XML.
**sitemap**
Log check result as an XML sitemap whose protocol is documented at
https://www.sitemaps.org/protocol.html.
**sql**
Log check result as SQL script with INSERT commands. An example
script to create the initial SQL table is included as create.sql.
**blacklist**
Suitable for cron jobs. Logs the check result into a file
**~/.linkchecker/blacklist** which only contains entries with
invalid URLs and the number of times they have failed.
**none**
Logs nothing. Suitable for debugging or checking the exit code.
REGULAR EXPRESSIONS
-------------------
LinkChecker accepts Python regular expressions. See
https://docs.python.org/howto/regex.html for an introduction.
An addition is that a leading exclamation mark negates the regular
expression.
COOKIE FILES
------------
A cookie file contains standard HTTP header (RFC 2616) data with the
following possible names:
**Host** (required)
Sets the domain the cookies are valid for.
**Path** (optional)
Gives the path the cookies are value for; default path is **/**.
**Set-cookie** (required)
Set cookie name/value. Can be given more than once.
Multiple entries are separated by a blank line. The example below will
send two cookies to all URLs starting with **http://example.com/hello/**
and one to all URLs starting with **https://example.org/**:
::
Host: example.com
Path: /hello
Set-cookie: ID="smee"
Set-cookie: spam="egg"
::
Host: example.org
Set-cookie: baggage="elitist"; comment="hologram"
PROXY SUPPORT
-------------
To use a proxy on Unix or Windows set the :envvar:`http_proxy`, :envvar:`https_proxy` or
:envvar:`ftp_proxy` environment variables to the proxy URL. The URL should be of
the form
**http://**\ [*user*\ **:**\ *pass*\ **@**]\ *host*\ [**:**\ *port*].
LinkChecker also detects manual proxy settings of Internet Explorer
under Windows systems, and GNOME or KDE on Linux systems. On a Mac use
the Internet Config to select a proxy.
You can also set a comma-separated domain list in the :envvar:`no_proxy`
environment variables to ignore any proxy settings for these domains.
Setting a HTTP proxy on Unix for example looks like this:
.. code-block:: console
$ export http_proxy="http://proxy.example.com:8080"
Proxy authentication is also supported:
.. code-block:: console
$ export http_proxy="http://user1:mypass@proxy.example.org:8081"
Setting a proxy on the Windows command prompt:
.. code-block:: doscon
C:\> set http_proxy=http://proxy.example.com:8080
PERFORMED CHECKS
----------------
All URLs have to pass a preliminary syntax test. Minor quoting mistakes
will issue a warning, all other invalid syntax issues are errors. After
the syntax check passes, the URL is queued for connection checking. All
connection check types are described below.
HTTP links (**http:**, **https:**)
After connecting to the given HTTP server the given path or query is
requested. All redirections are followed, and if user/password is
given it will be used as authorization when necessary. All final
HTTP status codes other than 2xx are errors.
HTML page contents are checked for recursion.
Local files (**file:**)
A regular, readable file that can be opened is valid. A readable
directory is also valid. All other files, for example device files,
unreadable or non-existing files are errors.
HTML or other parseable file contents are checked for recursion.
Mail links (**mailto:**)
A mailto: link eventually resolves to a list of email addresses.
If one address fails, the whole list will fail. For each mail
address we check the following things:
1. Check the address syntax, both the parts before and after the
@ sign.
2. Look up the MX DNS records. If we found no MX record, print an
error.
3. Check if one of the mail hosts accept an SMTP connection. Check
hosts with higher priority first. If no host accepts SMTP, we
print a warning.
4. Try to verify the address with the VRFY command. If we got an
answer, print the verified address as an info.
FTP links (**ftp:**)
For FTP links we do:
1. connect to the specified host
2. try to login with the given user and password. The default user
is **anonymous**, the default password is **anonymous@**.
3. try to change to the given directory
4. list the file with the NLST command
Telnet links (**telnet:**)
We try to connect and if user/password are given, login to the given
telnet server.
NNTP links (**news:**, **snews:**, **nntp**)
We try to connect to the given NNTP server. If a news group or
article is specified, try to request it from the server.
Unsupported links (**javascript:**, etc.)
An unsupported link will only print a warning. No further checking
will be made.
The complete list of recognized, but unsupported links can be found
in the
`linkcheck/checker/unknownurl.py <https://github.com/linkchecker/linkchecker/blob/master/linkcheck/checker/unknownurl.py>`__
source file. The most prominent of them should be JavaScript links.
PLUGINS
-------
There are two plugin types: connection and content plugins. Connection
plugins are run after a successful connection to the URL host. Content
plugins are run if the URL type has content (mailto: URLs have no
content for example) and if the check is not forbidden (ie. by HTTP
robots.txt).
Use the option :option:`--list-plugins` for a list of plugins and their
documentation. All plugins are enabled via the :manpage:`linkcheckerrc(5)`
configuration file.
RECURSION
---------
Before descending recursively into a URL, it has to fulfill several
conditions. They are checked in this order:
1. A URL must be valid.
2. A URL must be parseable. This currently includes HTML files, Opera
bookmarks files, and directories. If a file type cannot be determined
(for example it does not have a common HTML file extension, and the
content does not look like HTML), it is assumed to be non-parseable.
3. The URL content must be retrievable. This is usually the case except
for example mailto: or unknown URL types.
4. The maximum recursion level must not be exceeded. It is configured
with the :option:`--recursion-level` option and is unlimited per default.
5. It must not match the ignored URL list. This is controlled with the
:option:`--ignore-url` option.
6. The Robots Exclusion Protocol must allow links in the URL to be
followed recursively. This is checked by searching for a "nofollow"
directive in the HTML header data.
Note that the directory recursion reads all files in that directory, not
just a subset like **index.htm**.
NOTES
-----
URLs on the commandline starting with **ftp.** are treated like
**ftp://ftp.**, URLs starting with **www.** are treated like
**http://www.**. You can also give local files as arguments.
If you have your system configured to automatically establish a
connection to the internet (e.g. with diald), it will connect when
checking links not pointing to your local host. Use the :option:`--ignore-url`
option to prevent this.
Javascript links are not supported.
If your platform does not support threading, LinkChecker disables it
automatically.
You can supply multiple user/password pairs in a configuration file.
When checking **news:** links the given NNTP host doesn't need to be the
same as the host of the user browsing your pages.
ENVIRONMENT
-----------
.. envvar:: NNTP_SERVER
specifies default NNTP server
.. envvar:: http_proxy
specifies default HTTP proxy server
.. envvar:: ftp_proxy
specifies default FTP proxy server
.. envvar:: no_proxy
comma-separated list of domains to not contact over a proxy server
.. envvar:: LC_MESSAGES, LANG, LANGUAGE
specify output language
RETURN VALUE
------------
The return value is 2 when
- a program error occurred.
The return value is 1 when
- invalid links were found or
- link warnings were found and warnings are enabled
Else the return value is zero.
LIMITATIONS
-----------
LinkChecker consumes memory for each queued URL to check. With thousands
of queued URLs the amount of consumed memory can become quite large.
This might slow down the program or even the whole system.
FILES
-----
**~/.linkchecker/linkcheckerrc** - default configuration file
**~/.linkchecker/blacklist** - default blacklist logger output filename
**linkchecker-out.**\ *TYPE* - default logger file output name
SEE ALSO
--------
:manpage:`linkcheckerrc(5)`
https://docs.python.org/library/codecs.html#standard-encodings - valid
output encodings
https://docs.python.org/howto/regex.html - regular expression
documentation

View file

@ -0,0 +1,544 @@
:github_url: https://github.com/linkchecker/linkchecker/blob/master/doc/src/linkcheckerrc.rst
linkcheckerrc
=============
DESCRIPTION
-----------
**linkcheckerrc** is the configuration file for LinkChecker. The file is
written in an INI-style format.
The default file location is **~/.linkchecker/linkcheckerrc** on Unix,
**%HOMEPATH%\\.linkchecker\\linkcheckerrc** on Windows systems.
SETTINGS
--------
checking
^^^^^^^^
**cookiefile=**\ *filename*
Read a file with initial cookie data. The cookie data format is
explained in :manpage:`linkchecker(1)`.
Command line option: :option:`--cookiefile`
**debugmemory=**\ [**0**\ \|\ **1**]
Write memory allocation statistics to a file on exit, requires :pypi:`meliae`.
The default is not to write the file.
Command line option: none
**localwebroot=**\ *STRING*
When checking absolute URLs inside local files, the given root
directory is used as base URL.
Note that the given directory must have URL syntax, so it must use a
slash to join directories instead of a backslash. And the given
directory must end with a slash.
Command line option: none
**nntpserver=**\ *STRING*
Specify an NNTP server for **news:** links. Default is the
environment variable :envvar:`NNTP_SERVER`. If no host is given, only the
syntax of the link is checked.
Command line option: :option:`--nntp-server`
**recursionlevel=**\ *NUMBER*
Check recursively all links up to given depth. A negative depth will
enable infinite recursion. Default depth is infinite.
Command line option: :option:`--recursion-level`
**threads=**\ *NUMBER*
Generate no more than the given number of threads. Default number of
threads is 10. To disable threading specify a non-positive number.
Command line option: :option:`--threads`
**timeout=**\ *NUMBER*
Set the timeout for connection attempts in seconds. The default
timeout is 60 seconds.
Command line option: :option:`--timeout`
**aborttimeout=**\ *NUMBER*
Time to wait for checks to finish after the user aborts the first
time (with Ctrl-C or the abort button). The default abort timeout is
300 seconds.
Command line option: :option:`--timeout`
**useragent=**\ *STRING*
Specify the User-Agent string to send to the HTTP server, for
example "Mozilla/4.0". The default is "LinkChecker/X.Y" where X.Y is
the current version of LinkChecker.
Command line option: :option:`--user-agent`
**sslverify=**\ [**0**\ \|\ **1**\ \|\ *filename*]
If set to zero disables SSL certificate checking. If set to one (the
default) enables SSL certificate checking with the provided CA
certificate file. If a filename is specified, it will be used as the
certificate file.
Command line option: none
**maxrunseconds=**\ *NUMBER*
Stop checking new URLs after the given number of seconds. Same as if
the user stops (by hitting Ctrl-C) after the given number of
seconds.
The default is not to stop until all URLs are checked.
Command line option: none
**maxfilesizedownload=**\ *NUMBER*
Files larger than NUMBER bytes will be ignored, without downloading anything
if accessed over http and an accurate Content-Length header was returned.
No more than this amount of a file will be downloaded.
The default is 5242880 (5 MB).
Command line option: none
**maxfilesizeparse=**\ *NUMBER*
Files larger than NUMBER bytes will not be parsed for links.
The default is 1048576 (1 MB).
Command line option: none
**maxnumurls=**\ *NUMBER*
Maximum number of URLs to check. New URLs will not be queued after
the given number of URLs is checked.
The default is to queue and check all URLs.
Command line option: none
**maxrequestspersecond=**\ *NUMBER*
Limit the maximum number of requests per second to one host.
The default is 10.
Command line option: none
**robotstxt=**\ [**0**\ \|\ **1**]
When using http, fetch robots.txt, and confirm whether each URL should
be accessed before checking.
The default is to use robots.txt files.
Command line option: :option:`--no-robots`
**allowedschemes=**\ *NAME*\ [**,**\ *NAME*...]
Allowed URL schemes as comma-separated list.
Command line option: none
filtering
^^^^^^^^^
**ignore=**\ *REGEX* (MULTILINE)
Only check syntax of URLs matching the given regular expressions.
Command line option: :option:`--ignore-url`
**ignorewarnings=**\ *NAME*\ [**,**\ *NAME*...]
Ignore the comma-separated list of warnings. See `WARNINGS`_ for
the list of supported warnings.
Command line option: none
**internlinks=**\ *REGEX*
Regular expression to add more URLs recognized as internal links.
Default is that URLs given on the command line are internal.
Command line option: none
**nofollow=**\ *REGEX* (MULTILINE)
Check but do not recurse into URLs matching the given regular
expressions.
Command line option: :option:`--no-follow-url`
**checkextern=**\ [**0**\ \|\ **1**]
Check external links. Default is to check internal links only.
Command line option: :option:`--check-extern`
authentication
^^^^^^^^^^^^^^
**entry=**\ *REGEX* *USER* [*PASS*] (MULTILINE)
Provide individual username/password pairs for different links. In
addtion to a single login page specified with **loginurl** multiple
FTP, HTTP (Basic Authentication) and telnet links are supported.
Entries are a triple (URL regex, username, password) or a tuple (URL
regex, username), where the entries are separated by whitespace.
The password is optional and if missing it has to be entered at the
commandline.
If the regular expression matches the checked URL, the given
username/password pair is used for authentication. The command line
options :option:`-u` and :option:`-p` match every link and therefore override
the entries given here. The first match wins.
Command line option: :option:`-u`, :option:`-p`
**loginurl=**\ *URL*
The URL of a login page to be visited before link checking. The page
is expected to contain an HTML form to collect credentials and
submit them to the address in its action attribute using an HTTP
POST request. The name attributes of the input elements of the form
and the values to be submitted need to be available (see **entry**
for an explanation of username and password values).
**loginuserfield=**\ *STRING*
The name attribute of the username input element. Default: **login**.
**loginpasswordfield=**\ *STRING*
The name attribute of the password input element. Default: **password**.
**loginextrafields=**\ *NAME*\ **:**\ *VALUE* (MULTILINE)
Optionally the name attributes of any additional input elements and
the values to populate them with. Note that these are submitted
without checking whether matching input elements exist in the HTML
form.
output
^^^^^^
**debug=**\ *STRING*\ [**,**\ *STRING*...]
Print debugging output for the given modules. Available debug
modules are **cmdline**, **checking**, **cache**, **dns**,
**thread**, **plugins** and **all**. Specifying **all** is an alias
for specifying all available loggers.
Command line option: :option:`--debug`
**fileoutput=**\ *TYPE*\ [**,**\ *TYPE*...]
Output to a file **linkchecker-out.**\ *TYPE*, or
**$HOME/.linkchecker/blacklist** for **blacklist** output.
Valid file output types are **text**, **html**, **sql**, **csv**,
**gml**, **dot**, **xml**, **none** or **blacklist**. Default is no
file output. The various output types are documented below. Note
that you can suppress all console output with **output=none**.
Command line option: :option:`--file-output`
**log=**\ *TYPE*\ [**/**\ *ENCODING*]
Specify output type as **text**, **html**, **sql**, **csv**,
**gml**, **dot**, **xml**, **none** or **blacklist**. Default type
is **text**. The various output types are documented below.
The *ENCODING* specifies the output encoding, the default is that of
your locale. Valid encodings are listed at
https://docs.python.org/library/codecs.html#standard-encodings.
Command line option: :option:`--output`
**quiet=**\ [**0**\ \|\ **1**]
If set, operate quiet. An alias for **log=none**. This is only
useful with **fileoutput**.
Command line option: :option:`--verbose`
**status=**\ [**0**\ \|\ **1**]
Control printing check status messages. Default is 1.
Command line option: :option:`--no-status`
**verbose=**\ [**0**\ \|\ **1**]
If set log all checked URLs once. Default is to log only errors and
warnings.
Command line option: :option:`--verbose`
**warnings=**\ [**0**\ \|\ **1**]
If set log warnings. Default is to log warnings.
Command line option: :option:`--no-warnings`
text
^^^^
**filename=**\ *STRING*
Specify output filename for text logging. Default filename is
**linkchecker-out.txt**.
Command line option: :option:`--file-output`
**parts=**\ *STRING*
Comma-separated list of parts that have to be logged. See `LOGGER PARTS`_
below.
Command line option: none
**encoding=**\ *STRING*
Valid encodings are listed in
https://docs.python.org/library/codecs.html#standard-encodings.
Default encoding is **iso-8859-15**.
*color\**
Color settings for the various log parts, syntax is *color* or
*type*\ **;**\ *color*. The *type* can be **bold**, **light**,
**blink**, **invert**. The *color* can be **default**, **black**,
**red**, **green**, **yellow**, **blue**, **purple**, **cyan**,
**white**, **Black**, **Red**, **Green**, **Yellow**, **Blue**,
**Purple**, **Cyan** or **White**.
Command line option: none
**colorparent=**\ *STRING*
Set parent color. Default is **white**.
**colorurl=**\ *STRING*
Set URL color. Default is **default**.
**colorname=**\ *STRING*
Set name color. Default is **default**.
**colorreal=**\ *STRING*
Set real URL color. Default is **cyan**.
**colorbase=**\ *STRING*
Set base URL color. Default is **purple**.
**colorvalid=**\ *STRING*
Set valid color. Default is **bold;green**.
**colorinvalid=**\ *STRING*
Set invalid color. Default is **bold;red**.
**colorinfo=**\ *STRING*
Set info color. Default is **default**.
**colorwarning=**\ *STRING*
Set warning color. Default is **bold;yellow**.
**colordltime=**\ *STRING*
Set download time color. Default is **default**.
**colorreset=**\ *STRING*
Set reset color. Default is **default**.
gml
^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**parts=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
dot
^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**parts=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
csv
^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**parts=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**separator=**\ *CHAR*
Set CSV separator. Default is a comma (**,**).
**quotechar=**\ *CHAR*
Set CSV quote character. Default is a double quote (**"**).
sql
^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**parts=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**dbname=**\ *STRING*
Set database name to store into. Default is **linksdb**.
**separator=**\ *CHAR*
Set SQL command separator character. Default is a semicolon (**;**).
html
^^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**parts=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**colorbackground=**\ *COLOR*
Set HTML background color. Default is **#fff7e5**.
**colorurl=**
Set HTML URL color. Default is **#dcd5cf**.
**colorborder=**
Set HTML border color. Default is **#000000**.
**colorlink=**
Set HTML link color. Default is **#191c83**.
**colorwarning=**
Set HTML warning color. Default is **#e0954e**.
**colorerror=**
Set HTML error color. Default is **#db4930**.
**colorok=**
Set HTML valid color. Default is **#3ba557**.
blacklist
^^^^^^^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
xml
^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**parts=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
gxml
^^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**parts=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
sitemap
^^^^^^^
**filename=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**parts=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**encoding=**\ *STRING*
See :ref:`[text] <man/linkcheckerrc:text>` section above.
**priority=**\ *FLOAT*
A number between 0.0 and 1.0 determining the priority. The default
priority for the first URL is 1.0, for all child URLs 0.5.
**frequency=**\ [**always**\ \|\ **hourly**\ \|\ **daily**\ \|\ **weekly**\ \|\ **monthly**\ \|\ **yearly**\ \|\ **never**]
How frequently pages are changing.
LOGGER PARTS
------------
**all**
for all parts
**id**
a unique ID for each logentry
**realurl**
the full url link
**result**
valid or invalid, with messages
**extern**
1 or 0, only in some logger types reported
**base**
base href=...
**name**
<a href=...>name</a> and <img alt="name">
**parenturl**
if any
**info**
some additional info, e.g. FTP welcome messages
**warning**
warnings
**dltime**
download time
**checktime**
check time
**url**
the original url name, can be relative
**intro**
the blurb at the beginning, "starting at ..."
**outro**
the blurb at the end, "found x errors ..."
MULTILINE
---------
Some option values can span multiple lines. Each line has to be indented
for that to work. Lines starting with a hash (**#**) will be ignored,
though they must still be indented.
::
ignore=
lconline
bookmark
# a comment
^mailto:
EXAMPLE
-------
::
[output]
log=html
[checking]
threads=5
[filtering]
ignorewarnings=http-moved-permanent
PLUGINS
-------
All plugins have a separate section. If the section appears in the
configuration file the plugin is enabled. Some plugins read extra
options in their section.
AnchorCheck
^^^^^^^^^^^
Checks validity of HTML anchors.
LocationInfo
^^^^^^^^^^^^
Adds the country and if possible city name of the URL host as info.
Needs GeoIP or pygeoip and a local country or city lookup DB installed.
RegexCheck
^^^^^^^^^^
Define a regular expression which prints a warning if it matches any
content of the checked link. This applies only to valid pages, so we can
get their content.
**warningregex=**\ *REGEX*
Use this to check for pages that contain some form of error message,
for example "This page has moved" or "Oracle Application error".
*REGEX* should be unquoted.
Note that multiple values can be combined in the regular expression,
for example "(This page has moved\|Oracle Application error)".
SslCertificateCheck
^^^^^^^^^^^^^^^^^^^
Check SSL certificate expiration date. Only internal https: links will
be checked. A domain will only be checked once to avoid duplicate
warnings.
**sslcertwarndays=**\ *NUMBER*
Configures the expiration warning time in days.
HtmlSyntaxCheck
^^^^^^^^^^^^^^^
Check the syntax of HTML pages with the online W3C HTML validator. See
https://validator.w3.org/docs/api.html.
HttpHeaderInfo
^^^^^^^^^^^^^^
Print HTTP headers in URL info.
**prefixes=**\ *prefix1*\ [,*prefix2*]...
List of comma separated header prefixes. For example to display all
HTTP headers that start with "X-".
CssSyntaxCheck
^^^^^^^^^^^^^^
Check the syntax of HTML pages with the online W3C CSS validator. See
https://jigsaw.w3.org/css-validator/manual.html#expert.
VirusCheck
^^^^^^^^^^
Checks the page content for virus infections with clamav. A local clamav
daemon must be installed.
**clamavconf=**\ *filename*
Filename of **clamd.conf** config file.
PdfParser
^^^^^^^^^
Parse PDF files for URLs to check. Needs the :pypi:`pdfminer` Python package
installed.
WordParser
^^^^^^^^^^
Parse Word files for URLs to check. Needs the :pypi:`pywin32` Python
extension installed.
WARNINGS
--------
The following warnings are recognized in the 'ignorewarnings' config
file entry:
**file-missing-slash**
The file: URL is missing a trailing slash.
**file-system-path**
The file: path is not the same as the system specific path.
**ftp-missing-slash**
The ftp: URL is missing a trailing slash.
**http-cookie-store-error**
An error occurred while storing a cookie.
**http-empty-content**
The URL had no content.
**mail-no-mx-host**
The mail MX host could not be found.
**nntp-no-newsgroup**
The NNTP newsgroup could not be found.
**nntp-no-server**
No NNTP server was found.
**url-content-size-zero**
The URL content size is zero.
**url-content-too-large**
The URL content size is too large.
**url-effective-url**
The effective URL is different from the original.
**url-error-getting-content**
Could not get the content of the URL.
**url-obfuscated-ip**
The IP is obfuscated.
**url-whitespace**
The URL contains leading or trailing whitespace.
SEE ALSO
--------
:manpage:`linkchecker(1)`

1
doc/src/upgrading.rst Normal file
View file

@ -0,0 +1 @@
.. include:: ../upgrading.txt

View file

@ -16,6 +16,24 @@ is equivalent to:
Man Page Translations
---------------------
po4a is used to generate linkchecker.doc.pot, .po files and translated man pages.
Sphinx is used to generate .pot and .po (with sphinx-intl) files in i18n/
and man pages in man/.
``linkchecker/doc $ make po4a``
Create man.pot file in i18n/gettext/:
``linkchecker/doc $ make -C src gettext``
Create man.po file in i18n/locales/:
``linkchecker/doc/src $ sphinx-intl update -p ../i18n/gettext -l de``
These two steps can be performed with:
``linkchecker/doc $ make locale``
Create man pages:
``linkchecker/doc $ make man``
After updating the source files all steps need to be repeated, if translations
alone have been changed in the .po file only the last step is needed.

View file

@ -1,7 +1,7 @@
Upgrading
=========
Migrating from 9.x to 10.0
-------------------------
--------------------------
Python 3.5 or newer is required.
The Python Beautiful Soup package is now required. A C compiler is not needed

View file

@ -1,27 +0,0 @@
HOMEPAGE:=$(HOME)/public_html/linkchecker-webpage.git
WOK:=$(HOME)/projects/wok.git/wok
ICOICONS := media/images/logo16x16.png media/images/logo32x32.png
ALLICONS := $(ICOICONS) media/images/logo48x48.png media/images/logo64x64.png media/images/logo128x128.png
OXYGEN := $(HOME)/src/oxygen-gitsvn
all:
logo%.png: $(OXYGEN)/%/categories/applications-development-web.png
cp $< $@
media/favicon.ico: $(ICOICONS)
png2ico $@ $(ICOICONS)
gen: $(ALLICONS) media/favicon.ico
$(WOK) -v --debug
serve:
xdg-open http://localhost:8080
$(WOK) --server localhost:8080
upload:
cd $(HOMEPAGE) && git add . && git commit -m "Updated" && git push
release: gen upload
.PHONY: all gen serve upload release

View file

@ -1,5 +0,0 @@
version: "9.3"
name: "LinkChecker"
lname: "linkchecker"
maintainer: "Bastian Kleineidam"
author: "Bastian Kleineidam"

View file

@ -1,64 +0,0 @@
title: Check websites for broken links
---
Introduction
-------------
LinkChecker is a free, [GPL](http://www.gnu.org/licenses/gpl-2.0.html)
licensed website validator.
LinkChecker checks links in web documents or full websites.
It runs on Python 3 systems, requiring Python 3.5 or later.
Features
---------
- recursive and multithreaded checking and site crawling
- output in colored or normal text, HTML, SQL, CSV, XML or a sitemap
graph in different formats
- HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file
links support
- restriction of link checking with regular expression filters for URLs
- proxy support
- username/password authorization for HTTP and FTP and Telnet
- honors robots.txt exclusion protocol
- Cookie support
- HTML5 support
- [Plugin support](plugins.html)
allowing custom page checks. Currently available are
HTML and CSS syntax checks, Antivirus checks, and more.
- Different interfaces: command line and web interface
- ... and a lot more check options documented in the
[manual page](man1/linkchecker.1.html).
Screenshots
------------
[![CLI screenshot](images/shot1_thumb.jpg)](images/shot1.png) | [![CGI screenshot](images/shot3_thumb.jpg)](images/shot3.png)
--------------------------------------------------------------|--------------------------------------------------------------
Commandline interface | CGI web interface
Basic usage
------------
To check a URL like `http://www.example.org/myhomepage/` it is enough to
enter `http://www.example.org/myhomepage/` in the web interface, or execute
`linkchecker http://www.example.org/myhomepage/` on the command line.
This check will validate recursively all pages starting with
`http://www.example.org/myhomepage/`. Additionally, all external links
pointing outside of `www.example.org` will be checked but not recursed
into.
Other linkcheckers
-------------------
If this software does not fit your requirements, you can check out
[other free linkcheckers](other.html).
Test suite status
------------------
Linkchecker has extensive unit tests to ensure code quality.
[Travis CI](https://travis-ci.com/) is used for continuous build
and test integration.
[![Build Status](https://travis-ci.com/linkchecker/linkchecker.png)](https://travis-ci.com/linkchecker/linkchecker)

View file

@ -1,13 +0,0 @@
title: "Other free link checkers"
---
All of the programs below have an
[Open Source license](http://www.opensource.org/licenses/) like LinkChecker.
Programs without an Open Source license are not listed, as well as
programs which have not been updated for more than 10 years.
- [gURLChecker](http://labs.libre-entreprise.org/projects/gurlchecker/)
written in C, last updated February 22, 2011 (version 0.13.1)
- [W3C Link Checker](http://validator.w3.org/checklink/)
is an online service, but source is available, last updated 2011
- [webcheck](http://arthurdejong.org/webcheck/)
written in Python, last updated September 11, 2010 (version 1.10.4)

View file

@ -1,11 +0,0 @@
title: Plugin support
---
Plugin documentation
=====================
Standard plugins
=================
Custom plugins
===============

View file

@ -1,39 +0,0 @@
# Hook routines for the wok static site generator.
# Note that mediacompress is a local module.
import os
def compress_javascript(config, output_path):
"""Minify JS files."""
try:
from mediacompress import compress_js_files
except ImportError:
pass
else:
compress_js_files(output_path, excludes=("*.min.js",))
def compress_css(config, output_path):
"""Minify CSS files."""
try:
from mediacompress import compress_css_files
except ImportError:
pass
else:
compress_css_files(output_path)
def chmod(config):
"""Set correct file permissions."""
output_dir = config["output_dir"]
for dirpath, dirnames, filenames in os.walk(output_dir):
for dirname in dirnames:
os.chmod(os.path.join(dirpath, dirname), 0o755)
for filename in filenames:
os.chmod(os.path.join(dirpath, filename), 0o644)
hooks = {
"site.output.post": [compress_javascript, compress_css],
"site.done": [chmod],
}

View file

@ -1,226 +0,0 @@
html, body, div, span, applet, object, iframe,
h1, h2, h3, h4, h5, h6, p, blockquote, pre,
a, abbr, acronym, address, big, cite, code,
del, dfn, em, img, ins, kbd, q, s, samp,
small, strike, strong, sub, sup, tt, var,
b, u, i, center,
dl, dt, dd, ol, ul, li,
fieldset, form, label, legend,
table, caption, tbody, tfoot, thead, tr, th, td,
article, aside, canvas, details, embed,
figure, figcaption, footer, header, hgroup,
menu, nav, output, ruby, section, summary,
time, mark, audio, video {
margin: 0;
padding: 0;
border: 0;
font-size: 100%;
font: inherit;
vertical-align: baseline;
}
/* HTML5 display-role reset for older browsers */
article, aside, details, figcaption, figure,
footer, header, hgroup, menu, nav, section {
display: block;
}
body {
line-height: 1;
}
ol, ul {
list-style: none;
}
blockquote, q {
quotes: none;
}
blockquote:before, blockquote:after,
q:before, q:after {
content: '';
content: none;
}
table {
border-collapse: collapse;
border-spacing: 0;
}
body {
font-size: 13px;
line-height: 1.5;
font-family: 'Helvetica Neue', Helvetica, Arial, serif;
color: #000;
}
a {
color: #d5000d;
font-weight: bold;
}
header {
padding-top: 35px;
padding-bottom: 10px;
}
header h1 {
font-weight: bold;
letter-spacing: -1px;
font-size: 48px;
color: #303030;
line-height: 1.2;
}
header h2 {
letter-spacing: -1px;
font-size: 24px;
color: #aaa;
font-weight: normal;
line-height: 1.3;
}
#downloads {
display: none;
}
#main_content {
padding-top: 20px;
}
code, pre {
font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal;
color: #222;
margin-bottom: 30px;
font-size: 12px;
}
code {
padding: 0 3px;
}
pre {
border: solid 1px #ddd;
padding: 20px;
overflow: auto;
}
pre code {
padding: 0;
}
ul, ol, dl {
margin-bottom: 20px;
}
/* COMMON STYLES */
table {
width: 100%;
border: 1px solid #ebebeb;
}
th {
font-weight: 500;
}
td {
border: 1px solid #ebebeb;
text-align: center;
font-weight: 300;
}
form {
background: #f2f2f2;
padding: 20px;
}
/* GENERAL ELEMENT TYPE STYLES */
h1 {
font-size: 2.8em;
}
h2 {
font-size: 22px;
font-weight: bold;
color: #303030;
margin-bottom: 8px;
}
h3 {
color: #d5000d;
font-size: 18px;
font-weight: bold;
margin-bottom: 8px;
}
h4 {
font-size: 16px;
color: #303030;
font-weight: bold;
}
h5 {
font-size: 1em;
color: #303030;
}
h6 {
font-size: .8em;
color: #303030;
}
p {
font-weight: 300;
margin-bottom: 20px;
}
a {
text-decoration: none;
}
p a {
font-weight: 400;
}
blockquote {
font-size: 1.6em;
border-left: 10px solid #e9e9e9;
margin-bottom: 20px;
padding: 0 0 0 30px;
}
ul li {
list-style: disc inside;
padding-left: 20px;
}
ol li {
list-style: decimal inside;
padding-left: 3px;
}
dl dd {
font-style: italic;
font-weight: 100;
}
footer {
margin-top: 40px;
padding-top: 20px;
padding-bottom: 30px;
font-size: 13px;
color: #aaa;
}
footer a {
color: #666;
}
/* MISC */
.clearfix:after {
clear: both;
content: '.';
display: block;
visibility: hidden;
height: 0;
}
.clearfix {display: inline-block;}
* html .clearfix {height: 1%;}
.clearfix {display: block;}

View file

@ -1,69 +0,0 @@
.codehilite { background: #ffffff; }
.codehilite .c { color: #999988; font-style: italic } /* Comment */
.codehilite .err { color: #a61717; background-color: #e3d2d2 } /* Error */
.codehilite .k { font-weight: bold } /* Keyword */
.codehilite .o { font-weight: bold } /* Operator */
.codehilite .cm { color: #999988; font-style: italic } /* Comment.Multiline */
.codehilite .cp { color: #999999; font-weight: bold } /* Comment.Preproc */
.codehilite .c1 { color: #999988; font-style: italic } /* Comment.Single */
.codehilite .cs { color: #999999; font-weight: bold; font-style: italic } /* Comment.Special */
.codehilite .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */
.codehilite .gd .x { color: #000000; background-color: #ffaaaa } /* Generic.Deleted.Specific */
.codehilite .ge { font-style: italic } /* Generic.Emph */
.codehilite .gr { color: #aa0000 } /* Generic.Error */
.codehilite .gh { color: #999999 } /* Generic.Heading */
.codehilite .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */
.codehilite .gi .x { color: #000000; background-color: #aaffaa } /* Generic.Inserted.Specific */
.codehilite .go { color: #888888 } /* Generic.Output */
.codehilite .gp { color: #555555 } /* Generic.Prompt */
.codehilite .gs { font-weight: bold } /* Generic.Strong */
.codehilite .gu { color: #800080; font-weight: bold; } /* Generic.Subheading */
.codehilite .gt { color: #aa0000 } /* Generic.Traceback */
.codehilite .kc { font-weight: bold } /* Keyword.Constant */
.codehilite .kd { font-weight: bold } /* Keyword.Declaration */
.codehilite .kn { font-weight: bold } /* Keyword.Namespace */
.codehilite .kp { font-weight: bold } /* Keyword.Pseudo */
.codehilite .kr { font-weight: bold } /* Keyword.Reserved */
.codehilite .kt { color: #445588; font-weight: bold } /* Keyword.Type */
.codehilite .m { color: #009999 } /* Literal.Number */
.codehilite .s { color: #d14 } /* Literal.String */
.codehilite .na { color: #008080 } /* Name.Attribute */
.codehilite .nb { color: #0086B3 } /* Name.Builtin */
.codehilite .nc { color: #445588; font-weight: bold } /* Name.Class */
.codehilite .no { color: #008080 } /* Name.Constant */
.codehilite .ni { color: #800080 } /* Name.Entity */
.codehilite .ne { color: #990000; font-weight: bold } /* Name.Exception */
.codehilite .nf { color: #990000; font-weight: bold } /* Name.Function */
.codehilite .nn { color: #555555 } /* Name.Namespace */
.codehilite .nt { color: #000080 } /* Name.Tag */
.codehilite .nv { color: #008080 } /* Name.Variable */
.codehilite .ow { font-weight: bold } /* Operator.Word */
.codehilite .w { color: #bbbbbb } /* Text.Whitespace */
.codehilite .mf { color: #009999 } /* Literal.Number.Float */
.codehilite .mh { color: #009999 } /* Literal.Number.Hex */
.codehilite .mi { color: #009999 } /* Literal.Number.Integer */
.codehilite .mo { color: #009999 } /* Literal.Number.Oct */
.codehilite .sb { color: #d14 } /* Literal.String.Backtick */
.codehilite .sc { color: #d14 } /* Literal.String.Char */
.codehilite .sd { color: #d14 } /* Literal.String.Doc */
.codehilite .s2 { color: #d14 } /* Literal.String.Double */
.codehilite .se { color: #d14 } /* Literal.String.Escape */
.codehilite .sh { color: #d14 } /* Literal.String.Heredoc */
.codehilite .si { color: #d14 } /* Literal.String.Interpol */
.codehilite .sx { color: #d14 } /* Literal.String.Other */
.codehilite .sr { color: #009926 } /* Literal.String.Regex */
.codehilite .s1 { color: #d14 } /* Literal.String.Single */
.codehilite .ss { color: #990073 } /* Literal.String.Symbol */
.codehilite .bp { color: #999999 } /* Name.Builtin.Pseudo */
.codehilite .vc { color: #008080 } /* Name.Variable.Class */
.codehilite .vg { color: #008080 } /* Name.Variable.Global */
.codehilite .vi { color: #008080 } /* Name.Variable.Instance */
.codehilite .il { color: #009999 } /* Literal.Number.Integer.Long */
.type-csharp .codehilite .k { color: #0000FF }
.type-csharp .codehilite .kt { color: #0000FF }
.type-csharp .codehilite .nf { color: #000000; font-weight: normal }
.type-csharp .codehilite .nc { color: #2B91AF }
.type-csharp .codehilite .nn { color: #000000 }
.type-csharp .codehilite .s { color: #A31515 }
.type-csharp .codehilite .sc { color: #A31515 }

View file

@ -1,522 +0,0 @@
/* http://meyerweb.com/eric/tools/css/reset/
v2.0 | 20110126
License: none (public domain)
*/
html, body, div, span, applet, object, iframe,
h1, h2, h3, h4, h5, h6, p, blockquote, pre,
a, abbr, acronym, address, big, cite, code,
del, dfn, em, img, ins, kbd, q, s, samp,
small, strike, strong, sub, sup, tt, var,
b, u, i, center,
dl, dt, dd, ol, ul, li,
fieldset, form, label, legend,
table, caption, tbody, tfoot, thead, tr, th, td,
article, aside, canvas, details, embed,
figure, figcaption, footer, header, hgroup,
menu, nav, output, ruby, section, summary,
time, mark, audio, video {
margin: 0;
padding: 0;
border: 0;
font-size: 100%;
font: inherit;
vertical-align: baseline;
}
/* HTML5 display-role reset for older browsers */
article, aside, details, figcaption, figure,
footer, header, hgroup, menu, nav, section {
display: block;
}
body {
line-height: 1;
}
ol, ul {
list-style: none;
}
blockquote, q {
quotes: none;
}
blockquote:before, blockquote:after,
q:before, q:after {
content: '';
content: none;
}
table {
border-collapse: collapse;
border-spacing: 0;
}
/* LAYOUT STYLES */
body {
font-size: 15px;
line-height: 1.5;
background: #fafafa url(../images/body-bg.jpg) 0 0 repeat;
font-family: 'Helvetica Neue', Helvetica, Arial, serif;
font-weight: 400;
color: #666;
}
header h1 a {
color: #fff;
}
header h1 a:hover {
color: #eee;
}
p a, td a, li a {
color: #2879d0;
}
p a:hover, td a:hover, li a:hover {
color: #2268b2;
}
header {
padding-top: 40px;
padding-bottom: 40px;
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
background: #2e7bcf url(../images/header-bg.jpg) 0 0 repeat-x;
border-bottom: solid 1px #275da1;
}
header h1 {
letter-spacing: -1px;
font-size: 72px;
color: #fff;
line-height: 1;
margin-bottom: 0.2em;
width: 540px;
}
header h2 {
font-size: 26px;
color: #9ddcff;
font-weight: normal;
line-height: 1.3;
width: 540px;
letter-spacing: 0;
}
.inner {
position: relative;
width: 940px;
margin: 0 auto;
}
#content-wrapper {
border-top: solid 1px #fff;
padding-top: 30px;
}
#main-content {
width: 690px;
float: left;
}
#main-content img {
max-width: 100%;
}
#main-content strong {
font-weight: bold;
}
aside#sidebar {
width: 200px;
padding-left: 20px;
min-height: 504px;
float: right;
background: transparent url(../images/sidebar-bg.jpg) 0 0 no-repeat;
font-size: 12px;
line-height: 1.3;
}
aside#sidebar p.repo-owner,
aside#sidebar p.repo-owner a {
font-weight: bold;
}
aside#sidebar h2 {
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
font-size: 22px;
font-weight: bold;
margin-bottom: 8px;
color: #474747;
}
#downloads {
margin-bottom: 40px;
}
a.button {
width: 134px;
height: 58px;
line-height: 1.2;
font-size: 23px;
color: #fff;
padding-left: 68px;
padding-top: 23px;
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
}
a.button small {
display: block;
font-size: 11px;
}
a.button span {
display: block;
font-size: 18px;
}
header a.button {
position: absolute;
right: 0;
top: 0;
background: transparent url(../images/github-button.png) 0 0 no-repeat;
}
aside a.button {
width: 138px;
padding-left: 64px;
display: block;
background: transparent url(../images/download-button.png) 0 0 no-repeat;
margin-bottom: 20px;
font-size: 21px;
color: #fff;
}
code, pre {
font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal, monospace;
color: #222;
margin-bottom: 30px;
font-size: 13px;
}
code {
background-color: #f2f8fc;
border: solid 1px #dbe7f3;
padding: 0 3px;
}
pre {
padding: 20px;
background: #fff;
text-shadow: none;
overflow: auto;
border: solid 1px #f2f2f2;
}
pre code {
color: #2879d0;
background-color: #fff;
border: none;
padding: 0;
}
ul, ol, dl {
margin-bottom: 20px;
}
/* COMMON STYLES */
hr {
height: 1px;
line-height: 1px;
margin-top: 1em;
padding-bottom: 1em;
border: none;
background: transparent url('../images/hr.png') 0 0 no-repeat;
}
table {
width: 100%;
border: 1px solid #ebebeb;
}
th {
font-weight: 500;
}
td {
border: 1px solid #ebebeb;
text-align: center;
font-weight: 300;
}
/* GENERAL ELEMENT TYPE STYLES */
#main-content h1 {
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
font-size: 2.8em;
letter-spacing: -1px;
color: #474747;
}
#main-content h1:before {
content: "/";
color: #9ddcff;
padding-right: 0.3em;
margin-left: -0.9em;
}
#main-content h2 {
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
font-size: 22px;
font-weight: bold;
margin-bottom: 8px;
color: #474747;
}
#main-content h2:before {
content: "//";
color: #9ddcff;
padding-right: 0.3em;
margin-left: -1.5em;
}
#main-content h3 {
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
font-size: 18px;
font-weight: bold;
margin-top: 24px;
margin-bottom: 8px;
color: #474747;
}
#main-content h3:before {
content: "///";
color: #9ddcff;
padding-right: 0.3em;
margin-left: -2em;
}
#main-content h4 {
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
font-size: 15px;
font-weight: bold;
color: #474747;
}
h4:before {
content: "////";
color: #9ddcff;
padding-right: 0.3em;
margin-left: -2.8em;
}
#main-content h5 {
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
font-size: 14px;
color: #474747;
}
h5:before {
content: "/////";
color: #9ddcff;
padding-right: 0.3em;
margin-left: -3.2em;
}
#main-content h6 {
font-family: 'Architects Daughter', 'Helvetica Neue', Helvetica, Arial, serif;
font-size: .8em;
color: #474747;
}
h6:before {
content: "//////";
color: #9ddcff;
padding-right: 0.3em;
margin-left: -3.7em;
}
p {
margin-bottom: 20px;
}
a {
text-decoration: none;
}
p a {
font-weight: 400;
}
blockquote {
font-size: 1.6em;
border-left: 10px solid #e9e9e9;
margin-bottom: 20px;
padding: 0 0 0 30px;
}
ul li {
list-style: disc inside;
padding-left: 20px;
}
ol li {
list-style: decimal inside;
padding-left: 3px;
}
dl dd {
font-style: italic;
font-weight: 100;
}
.failed {
color: #990000;
}
.ok {
color: #669900;
}
footer {
background: transparent url('../images/hr.png') 0 0 no-repeat;
margin-top: 40px;
padding-top: 20px;
padding-bottom: 30px;
font-size: 13px;
color: #aaa;
}
footer a {
color: #666;
}
footer a:hover {
color: #444;
}
/* MISC */
.clearfix:after {
clear: both;
content: '.';
display: block;
visibility: hidden;
height: 0;
}
.clearfix {display: inline-block;}
* html .clearfix {height: 1%;}
.clearfix {display: block;}
.comicinfo {
margin: 45px;
width: 480px;
border-collapse: collapse;
}
.comicinfo th {
font-size: 14px;
font-weight: bold;
padding: 10px 8px;
text-align: left;
border-bottom: 1px solid #ccc;
}
.comicinfo td {
padding: 6px 8px;
text-align: left;
border-bottom: 1px solid #ccc;
}
/* #Media Queries
================================================== */
/* Smaller than standard 960 (devices and browsers) */
@media only screen and (max-width: 959px) {}
/* Tablet Portrait size to standard 960 (devices and browsers) */
@media only screen and (min-width: 768px) and (max-width: 959px) {
.inner {
width: 740px;
}
header h1, header h2 {
width: 340px;
}
header h1 {
font-size: 60px;
}
header h2 {
font-size: 30px;
}
#main-content {
width: 490px;
}
#main-content h1:before,
#main-content h2:before,
#main-content h3:before,
#main-content h4:before,
#main-content h5:before,
#main-content h6:before {
content: none;
padding-right: 0;
margin-left: 0;
}
}
/* All Mobile Sizes (devices and browser) */
@media only screen and (max-width: 767px) {
.inner {
width: 93%;
}
header {
padding: 20px 0;
}
header .inner {
position: relative;
}
header h1, header h2 {
width: 100%;
}
header h1 {
font-size: 48px;
}
header h2 {
font-size: 24px;
}
header a.button {
background-image: none;
width: auto;
height: auto;
display: inline-block;
margin-top: 15px;
padding: 5px 10px;
position: relative;
text-align: center;
font-size: 13px;
line-height: 1;
background-color: #9ddcff;
color: #2879d0;
-moz-border-radius: 5px;
-webkit-border-radius: 5px;
border-radius: 5px;
}
header a.button small {
font-size: 13px;
display: inline;
}
#main-content,
aside#sidebar {
float: none;
width: 100% ! important;
}
aside#sidebar {
background-image: none;
margin-top: 20px;
border-top: solid 1px #ddd;
padding: 20px 0;
min-height: 0;
}
aside#sidebar a.button {
display: none;
}
#main-content h1:before,
#main-content h2:before,
#main-content h3:before,
#main-content h4:before,
#main-content h5:before,
#main-content h6:before {
content: none;
padding-right: 0;
margin-left: 0;
}
}
/* Mobile Landscape Size to Tablet Portrait (devices and browsers) */
@media only screen and (min-width: 480px) and (max-width: 767px) {}
/* Mobile Portrait Size to Mobile Landscape Size (devices and browsers) */
@media only screen and (max-width: 479px) {}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1 KiB

View file

@ -1,526 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<style>
table.head, table.foot { width: 100%; }
td.head-rtitle, td.foot-os { text-align: right; }
td.head-vol { text-align: center; }
div.Pp { margin: 1ex 0ex; }
div.Nd, div.Bf, div.Op { display: inline; }
span.Pa, span.Ad { font-style: italic; }
span.Ms { font-weight: bold; }
dl.Bl-diag > dt { font-weight: bold; }
code.Nm, code.Fl, code.Cm, code.Ic, code.In, code.Fd, code.Fn,
code.Cd { font-weight: bold; font-family: inherit; }
</style>
<title>LINKCHECKER(1)</title>
</head>
<body>
<table class="head">
<tr>
<td class="head-ltitle">LINKCHECKER(1)</td>
<td class="head-vol">LinkChecker User Manual</td>
<td class="head-rtitle">LINKCHECKER(1)</td>
</tr>
</table>
<div class="manual-text">
<section class="Sh">
<h1 class="Sh" id="NAME"><a class="permalink" href="#NAME">NAME</a></h1>
linkchecker - command line client to check HTML documents and websites for
broken links
</section>
<section class="Sh">
<h1 class="Sh" id="SYNOPSIS"><a class="permalink" href="#SYNOPSIS">SYNOPSIS</a></h1>
<b>linkchecker</b> [<i>options</i>] [<i>file-or-url</i>]...
</section>
<section class="Sh">
<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1>
<dl class="Bl-tag">
<dt>LinkChecker features</dt>
<dd></dd>
</dl>
<ul class="Bl-bullet">
<li>recursive and multithreaded checking,</li>
<li>output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph
in different formats,</li>
<li>support for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local
file links,</li>
<li>restriction of link checking with URL filters,</li>
<li>proxy support,</li>
<li>username/password authorization for HTTP, FTP and Telnet,</li>
<li>support for robots.txt exclusion protocol,</li>
<li>support for Cookies</li>
<li>support for HTML5</li>
<li>HTML and CSS syntax check</li>
<li>Antivirus check</li>
<li>a command line and web interface</li>
</ul>
</section>
<section class="Sh">
<h1 class="Sh" id="EXAMPLES"><a class="permalink" href="#EXAMPLES">EXAMPLES</a></h1>
<dl class="Bl-tag">
<dt>The most common use checks the given domain recursively:</dt>
<dd><b>linkchecker http://www.example.com/</b>
<br/>
Beware that this checks the whole site which can have thousands of URLs. Use
the <b>-r</b> option to restrict the recursion depth.</dd>
<dt>Don't check URLs with <b>/secret</b> in its name. All other links are
checked as usual:</dt>
<dd><b>linkchecker --ignore-url=/secret mysite.example.com</b></dd>
<dt>Checking a local HTML file on Unix:</dt>
<dd><b>linkchecker ../bla.html</b></dd>
<dt>Checking a local HTML file on Windows:</dt>
<dd><b>linkchecker c:empest.html</b></dd>
<dt>You can skip the <b>http://</b> url part if the domain starts with
<b>www.</b>:</dt>
<dd><b>linkchecker www.example.com</b></dd>
<dt>You can skip the <b>ftp://</b> url part if the domain starts with
<b>ftp.</b>:</dt>
<dd><b>linkchecker -r0 ftp.example.com</b></dd>
<dt>Generate a sitemap graph and convert it with the graphviz dot
utility:</dt>
<dd><b>linkchecker -odot -v www.example.com | dot -Tps &gt;
sitemap.ps</b></dd>
</dl>
</section>
<section class="Sh">
<h1 class="Sh" id="OPTIONS"><a class="permalink" href="#OPTIONS">OPTIONS</a></h1>
<section class="Ss">
<h2 class="Ss" id="General_options"><a class="permalink" href="#General_options">General
options</a></h2>
<dl class="Bl-tag">
<dt><b>-f</b><i>FILENAME</i>, <b>--config=</b><i>FILENAME</i></dt>
<dd>Use <i>FILENAME</i> as configuration file. As default LinkChecker uses
<b>~/.linkchecker/linkcheckerrc</b>.</dd>
<dt><b>-h</b>, <b>--help</b></dt>
<dd>Help me! Print usage information for this program.</dd>
<dt><b>--stdin</b></dt>
<dd>Read list of white-space separated URLs to check from stdin.</dd>
<dt><b>-t</b><i>NUMBER</i>, <b>--threads=</b><i>NUMBER</i></dt>
<dd>Generate no more than the given number of threads. Default number of
threads is 10. To disable threading specify a non-positive number.</dd>
<dt><b>-V</b>, <b>--version</b></dt>
<dd>Print version and exit.</dd>
<dt><b>--list-plugins</b></dt>
<dd>Print available check plugins and exit.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="Output_options"><a class="permalink" href="#Output_options">Output
options</a></h2>
<dl class="Bl-tag">
<dt><b>-D</b><i>STRING</i>, <b>--debug=</b><i>STRING</i></dt>
<dd>Print debugging output for the given logger. Available loggers are
<b>cmdline</b>, <b>checking</b>, <b>cache</b>, <b>dns</b>, <b>plugin</b>
and <b>all</b>. Specifying <b>all</b> is an alias for specifying all
available loggers. The option can be given multiple times to debug with
more than one logger. For accurate results, threading will be disabled
during debug runs.</dd>
<dt><b>-F</b><i>TYPE</i>[<b>/</b><i>ENCODING</i>][<b>/</b><i>FILENAME</i>],
<b>--file-output=</b><i>TYPE</i>[<b>/</b><i>ENCODING</i>][<b>/</b><i>FILENAME</i>]</dt>
<dd>Output to a file <b>linkchecker-out.</b><i>TYPE</i>,
<b>$HOME/.linkchecker/blacklist</b> for <b>blacklist</b> output, or
<i>FILENAME</i> if specified. The <i>ENCODING</i> specifies the output
encoding, the default is that of your locale. Valid encodings are listed
at
<a class="Lk" href="https://docs.python.org/library/codecs.html#standard-encodings">https://docs.python.org/library/codecs.html#standard-encodings</a>.
<br/>
The <i>FILENAME</i> and <i>ENCODING</i> parts of the <b>none</b> output type
will be ignored, else if the file already exists, it will be overwritten.
You can specify this option more than once. Valid file output types are
<b>text</b>, <b>html</b>, <b>sql</b>, <b>csv</b>, <b>gml</b>, <b>dot</b>,
<b>xml</b>, <b>sitemap</b>, <b>none</b> or <b>blacklist</b>. Default is no
file output. The various output types are documented below. Note that you
can suppress all console output with the option <b>-o none</b>.</dd>
<dt><b>--no-status</b></dt>
<dd>Do not print check status messages.</dd>
<dt><b>--no-warnings</b></dt>
<dd>Don't log warnings. Default is to log warnings.</dd>
<dt><b>-o</b><i>TYPE</i>[<b>/</b><i>ENCODING</i>],
<b>--output=</b><i>TYPE</i>[<b>/</b><i>ENCODING</i>]</dt>
<dd>Specify output type as <b>text</b>, <b>html</b>, <b>sql</b>, <b>csv</b>,
<b>gml</b>, <b>dot</b>, <b>xml</b>, <b>sitemap</b>, <b>none</b> or
<b>blacklist</b>. Default type is <b>text</b>. The various output types
are documented below.
<br/>
The <i>ENCODING</i> specifies the output encoding, the default is that of
your locale. Valid encodings are listed at
<a class="Lk" href="https://docs.python.org/library/codecs.html#standard-encodings">https://docs.python.org/library/codecs.html#standard-encodings</a>.</dd>
<dt><b>-q</b>, <b>--quiet</b></dt>
<dd>Quiet operation, an alias for <b>-o none</b>. This is only useful with
<b>-F</b>.</dd>
<dt><b>-v</b>, <b>--verbose</b></dt>
<dd>Log all checked URLs. Default is to log only errors and warnings.</dd>
<dt><b>-W</b><i>REGEX</i>, <b>--warning-regex=</b><i>REGEX</i><b></b></dt>
<dd>Define a regular expression which prints a warning if it matches any
content of the checked link. This applies only to valid pages, so we can
get their content.
<br/>
Use this to check for pages that contain some form of error, for example
&quot;This page has moved&quot; or &quot;Oracle Application error&quot;.
<br/>
Note that multiple values can be combined in the regular expression, for
example &quot;(This page has moved|Oracle Application error)&quot;.
<br/>
See section <b>REGULAR EXPRESSIONS</b> for more info.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="Checking_options"><a class="permalink" href="#Checking_options">Checking
options</a></h2>
<dl class="Bl-tag">
<dt><b>--cookiefile=</b><i>FILENAME</i></dt>
<dd>Read a file with initial cookie data. The cookie data format is explained
below.</dd>
<dt><b>--check-extern</b></dt>
<dd>Check external URLs.</dd>
<dt><b>--ignore-url=</b><i>REGEX</i></dt>
<dd>URLs matching the given regular expression will be ignored and not
checked.
<br/>
This option can be given multiple times.
<br/>
See section <b>REGULAR EXPRESSIONS</b> for more info.</dd>
<dt><b>-N</b><i>STRING</i>, <b>--nntp-server=</b><i>STRING</i></dt>
<dd>Specify an NNTP server for <b>news:</b> links. Default is the environment
variable <b>NNTP_SERVER</b>. If no host is given, only the syntax of the
link is checked.</dd>
<dt><b>--no-follow-url=</b><i>REGEX</i></dt>
<dd>Check but do not recurse into URLs matching the given regular expression.
<br/>
This option can be given multiple times.
<br/>
See section <b>REGULAR EXPRESSIONS</b> for more info.</dd>
<dt><b>-p</b>, <b>--password</b></dt>
<dd>Read a password from console and use it for HTTP and FTP authorization.
For FTP the default password is <b>anonymous@</b>. For HTTP there is no
default password. See also <b>-u</b>.</dd>
<dt><b>-r</b><i>NUMBER</i>, <b>--recursion-level=</b><i>NUMBER</i></dt>
<dd>Check recursively all links up to given depth. A negative depth will
enable infinite recursion. Default depth is infinite.</dd>
<dt><b>--timeout=</b><i>NUMBER</i></dt>
<dd>Set the timeout for connection attempts in seconds. The default timeout is
60 seconds.</dd>
<dt><b>-u</b><i>STRING</i>, <b>--user=</b><i>STRING</i></dt>
<dd>Try the given username for HTTP and FTP authorization. For FTP the default
username is <b>anonymous</b>. For HTTP there is no default username. See
also <b>-p</b>.</dd>
<dt><b>--user-agent=</b><i>STRING</i></dt>
<dd>Specify the User-Agent string to send to the HTTP server, for example
&quot;Mozilla/4.0&quot;. The default is &quot;LinkChecker/X.Y&quot; where
X.Y is the current version of LinkChecker.
<p class="Pp"></p>
</dd>
</dl>
</section>
</section>
<section class="Sh">
<h1 class="Sh" id="CONFIGURATION_FILES"><a class="permalink" href="#CONFIGURATION_FILES">CONFIGURATION
FILES</a></h1>
Configuration files can specify all options above. They can also specify some
options that cannot be set on the command line. See <a href="../man5/linkcheckerrc.5.html" class="Xr">linkcheckerrc(5)</a>
for more info.
<p class="Pp"></p>
</section>
<section class="Sh">
<h1 class="Sh" id="OUTPUT_TYPES"><a class="permalink" href="#OUTPUT_TYPES">OUTPUT
TYPES</a></h1>
Note that by default only errors and warnings are logged. You should use the
<b>--verbose</b> option to get the complete URL list, especially when
outputting a sitemap graph format.
<p class="Pp"></p>
<dl class="Bl-tag">
<dt><b>text</b></dt>
<dd>Standard text logger, logging URLs in keyword: argument fashion.</dd>
<dt><b>html</b></dt>
<dd>Log URLs in keyword: argument fashion, formatted as HTML. Additionally has
links to the referenced pages. Invalid URLs have HTML and CSS syntax check
links appended.</dd>
<dt><b>csv</b></dt>
<dd>Log check result in CSV format with one URL per line.</dd>
<dt><b>gml</b></dt>
<dd>Log parent-child relations between linked URLs as a GML sitemap
graph.</dd>
<dt><b>dot</b></dt>
<dd>Log parent-child relations between linked URLs as a DOT sitemap
graph.</dd>
<dt><b>gxml</b></dt>
<dd>Log check result as a GraphXML sitemap graph.</dd>
<dt><b>xml</b></dt>
<dd>Log check result as machine-readable XML.</dd>
<dt><b>sitemap</b></dt>
<dd>Log check result as an XML sitemap whose protocol is documented at
<a class="Lk" href="https://www.sitemaps.org/protocol.html">https://www.sitemaps.org/protocol.html</a>.</dd>
<dt><b>sql</b></dt>
<dd>Log check result as SQL script with INSERT commands. An example script to
create the initial SQL table is included as create.sql.</dd>
<dt><b>blacklist</b></dt>
<dd>Suitable for cron jobs. Logs the check result into a file
<b>~/.linkchecker/blacklist</b> which only contains entries with invalid
URLs and the number of times they have failed.</dd>
<dt><b>none</b></dt>
<dd>Logs nothing. Suitable for debugging or checking the exit code.</dd>
</dl>
</section>
<section class="Sh">
<h1 class="Sh" id="REGULAR_EXPRESSIONS"><a class="permalink" href="#REGULAR_EXPRESSIONS">REGULAR
EXPRESSIONS</a></h1>
LinkChecker accepts Python regular expressions. See
<a class="Lk" href="https://docs.python.org/howto/regex.html">https://docs.python.org/howto/regex.html</a>
for an introduction.
<p class="Pp">An addition is that a leading exclamation mark negates the regular
expression.</p>
</section>
<section class="Sh">
<h1 class="Sh" id="COOKIE_FILES"><a class="permalink" href="#COOKIE_FILES">COOKIE
FILES</a></h1>
A cookie file contains standard HTTP header (RFC 2616) data with the following
possible names:
<dl class="Bl-tag">
<dt><b>Host</b> (required)</dt>
<dd>Sets the domain the cookies are valid for.</dd>
<dt><b>Path</b> (optional)</dt>
<dd>Gives the path the cookies are value for; default path is <b>/</b>.</dd>
<dt><b>Set-cookie</b> (required)</dt>
<dd>Set cookie name/value. Can be given more than once.</dd>
</dl>
<p class="Pp">Multiple entries are separated by a blank line. The example below
will send two cookies to all URLs starting with
<b>http://example.com/hello/</b> and one to all URLs starting with
<b>https://example.org/</b>:</p>
<pre>
Host: example.com
Path: /hello
Set-cookie: ID=&quot;smee&quot;
Set-cookie: spam=&quot;egg&quot;
</pre>
<pre>
Host: example.org
Set-cookie: baggage=&quot;elitist&quot;; comment=&quot;hologram&quot;
</pre>
</section>
<section class="Sh">
<h1 class="Sh" id="PROXY_SUPPORT"><a class="permalink" href="#PROXY_SUPPORT">PROXY
SUPPORT</a></h1>
To use a proxy on Unix or Windows set the $http_proxy, $https_proxy or
$ftp_proxy environment variables to the proxy URL. The URL should be of the
form
<b>http://</b>[<i>user</i><b>:</b><i>pass</i><b>@</b>]<i>host</i>[<b>:</b><i>port</i>].
LinkChecker also detects manual proxy settings of Internet Explorer under
Windows systems, and GNOME or KDE on Linux systems. On a Mac use the Internet
Config to select a proxy.
<p class="Pp">You can also set a comma-separated domain list in the $no_proxy
environment variables to ignore any proxy settings for these domains.</p>
<dl class="Bl-tag">
<dt>Setting a HTTP proxy on Unix for example looks like this:</dt>
<dd><b>export http_proxy=&quot;http://proxy.example.com:8080&quot;</b></dd>
<dt>Proxy authentication is also supported:</dt>
<dd><b>export
http_proxy=&quot;http://user1:mypass@proxy.example.org:8081&quot;</b></dd>
<dt>Setting a proxy on the Windows command prompt:</dt>
<dd><b>set http_proxy=http://proxy.example.com:8080</b></dd>
</dl>
</section>
<section class="Sh">
<h1 class="Sh" id="PERFORMED_CHECKS"><a class="permalink" href="#PERFORMED_CHECKS">PERFORMED
CHECKS</a></h1>
All URLs have to pass a preliminary syntax test. Minor quoting mistakes will
issue a warning, all other invalid syntax issues are errors. After the syntax
check passes, the URL is queued for connection checking. All connection check
types are described below.
<dl class="Bl-tag">
<dt>HTTP links (<b>http:</b>, <b>https:</b>)</dt>
<dd>After connecting to the given HTTP server the given path or query is
requested. All redirections are followed, and if user/password is given it
will be used as authorization when necessary. All final HTTP status codes
other than 2xx are errors.</dd>
</dl>
<dl class="Bl-tag">
<dt></dt>
<dd>HTML page contents are checked for recursion.</dd>
</dl>
<dl class="Bl-tag">
<dt>Local files (<b>file:</b>)</dt>
<dd>A regular, readable file that can be opened is valid. A readable directory
is also valid. All other files, for example device files, unreadable or
non-existing files are errors.</dd>
</dl>
<dl class="Bl-tag">
<dt></dt>
<dd>HTML or other parseable file contents are checked for recursion.</dd>
</dl>
<dl class="Bl-tag">
<dt>Mail links (<b>mailto:</b>)</dt>
<dd>A mailto: link eventually resolves to a list of email addresses. If one
address fails, the whole list will fail. For each mail address we check
the following things:
<br/>
1) Check the adress syntax, both of the part before and after the @ sign.
<br/>
2) Look up the MX DNS records. If we found no MX record, print an error.
<br/>
3) Check if one of the mail hosts accept an SMTP connection. Check hosts
with higher priority first. If no host accepts SMTP, we print a warning.
<br/>
4) Try to verify the address with the VRFY command. If we got an answer,
print the verified address as an info.
<p class="Pp"></p>
</dd>
<dt>FTP links (<b>ftp:</b>)</dt>
<dd>For FTP links we do:
<br/>
1) connect to the specified host
<br/>
2) try to login with the given user and password. The default user is
<b>anonymous</b>, the default password is <b>anonymous@</b>.
<br/>
3) try to change to the given directory
<br/>
4) list the file with the NLST command
<p class="Pp"></p>
</dd>
<dt>Telnet links (<b>telnet:</b>)</dt>
<dd>We try to connect and if user/password are given, login to the given
telnet server.
<p class="Pp"></p>
</dd>
<dt>NNTP links (<b>news:</b>, <b>snews:</b>, <b>nntp</b>)</dt>
<dd>We try to connect to the given NNTP server. If a news group or article is
specified, try to request it from the server.
<p class="Pp"></p>
</dd>
<dt>Unsupported links (<b>javascript:</b>, etc.)</dt>
<dd>An unsupported link will only print a warning. No further checking will be
made.</dd>
</dl>
<dl class="Bl-tag">
<dt></dt>
<dd>The complete list of recognized, but unsupported links can be found in the
<a class="Lk" href="https://github.com/linkchecker/linkchecker/blob/master/linkcheck/checker/unknownurl.py">linkcheck/checker/unknownurl.py</a>
source file. The most prominent of them should be JavaScript links.</dd>
</dl>
</section>
<section class="Sh">
<h1 class="Sh" id="PLUGINS"><a class="permalink" href="#PLUGINS">PLUGINS</a></h1>
There are two plugin types: connection and content plugins. Connection plugins
are run after a successful connection to the URL host. Content plugins are run
if the URL type has content (mailto: URLs have no content for example) and if
the check is not forbidden (ie. by HTTP robots.txt).
<p class="Pp">See <b>linkchecker --list-plugins</b> for a list of plugins and
their documentation. All plugins are enabled via the <a href="../man5/linkcheckerrc.5.html" class="Xr">linkcheckerrc(5)</a>
configuration file.</p>
<p class="Pp"></p>
</section>
<section class="Sh">
<h1 class="Sh" id="RECURSION"><a class="permalink" href="#RECURSION">RECURSION</a></h1>
Before descending recursively into a URL, it has to fulfill several conditions.
They are checked in this order:
<p class="Pp">1. A URL must be valid.</p>
<p class="Pp">2. A URL must be parseable. This currently includes HTML files,
Opera bookmarks files, and directories. If a file type cannot
be determined (for example it does not have a common HTML file
extension, and the content does not look like HTML), it is assumed
to be non-parseable.</p>
<p class="Pp">3. The URL content must be retrievable. This is usually the case
except for example mailto: or unknown URL types.</p>
<p class="Pp">4. The maximum recursion level must not be exceeded. It is
configured
with the <b>--recursion-level</b> option and is unlimited per default.</p>
<p class="Pp">5. It must not match the ignored URL list. This is controlled with
the <b>--ignore-url</b> option.</p>
<p class="Pp">6. The Robots Exclusion Protocol must allow links in the URL to be
followed recursively. This is checked by searching for a
&quot;nofollow&quot; directive in the HTML header data.</p>
<p class="Pp">Note that the directory recursion reads all files in that
directory, not just a subset like <b>index.htm*</b>.</p>
<p class="Pp"></p>
</section>
<section class="Sh">
<h1 class="Sh" id="NOTES"><a class="permalink" href="#NOTES">NOTES</a></h1>
URLs on the commandline starting with <b>ftp.</b> are treated like
<b>ftp://ftp.</b>, URLs starting with <b>www.</b> are treated like
<b>http://www.</b>. You can also give local files as arguments.
<p class="Pp">If you have your system configured to automatically establish a
connection to the internet (e.g. with diald), it will connect when checking
links not pointing to your local host. Use the <b>--ignore-url</b> option to
prevent this.</p>
<p class="Pp">Javascript links are not supported.</p>
<p class="Pp">If your platform does not support threading, LinkChecker disables
it automatically.</p>
<p class="Pp">You can supply multiple user/password pairs in a configuration
file.</p>
<p class="Pp">When checking <b>news:</b> links the given NNTP host doesn't need
to be the same as the host of the user browsing your pages.</p>
</section>
<section class="Sh">
<h1 class="Sh" id="ENVIRONMENT"><a class="permalink" href="#ENVIRONMENT">ENVIRONMENT</a></h1>
<b>NNTP_SERVER</b> - specifies default NNTP server
<br/>
<b>http_proxy</b> - specifies default HTTP proxy server
<br/>
<b>ftp_proxy</b> - specifies default FTP proxy server
<br/>
<b>no_proxy</b> - comma-separated list of domains to not contact over a proxy
server
<br/>
<b>LC_MESSAGES</b>, <b>LANG</b>, <b>LANGUAGE</b> - specify output language
</section>
<section class="Sh">
<h1 class="Sh" id="RETURN_VALUE"><a class="permalink" href="#RETURN_VALUE">RETURN
VALUE</a></h1>
The return value is 2 when
<dl class="Bl-tag">
<dt>&#x2022;</dt>
<dd>a program error occurred.</dd>
</dl>
<p class="Pp">The return value is 1 when</p>
<ul class="Bl-bullet">
<li>invalid links were found or</li>
<li>link warnings were found and warnings are enabled</li>
</ul>
<p class="Pp">Else the return value is zero.</p>
</section>
<section class="Sh">
<h1 class="Sh" id="LIMITATIONS"><a class="permalink" href="#LIMITATIONS">LIMITATIONS</a></h1>
LinkChecker consumes memory for each queued URL to check. With thousands of
queued URLs the amount of consumed memory can become quite large. This might
slow down the program or even the whole system.
</section>
<section class="Sh">
<h1 class="Sh" id="FILES"><a class="permalink" href="#FILES">FILES</a></h1>
<b>~/.linkchecker/linkcheckerrc</b> - default configuration file
<br/>
<b>~/.linkchecker/blacklist</b> - default blacklist logger output filename
<br/>
<b>linkchecker-out.</b><i>TYPE</i> - default logger file output name
<br/>
<a class="Lk" href="https://docs.python.org/library/codecs.html#standard-encodings">https://docs.python.org/library/codecs.html#standard-encodings</a>
- valid output encodings
<br/>
<a class="Lk" href="https://docs.python.org/howto/regex.html">https://docs.python.org/howto/regex.html</a>
- regular expression documentation
<p class="Pp"></p>
</section>
<section class="Sh">
<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE
ALSO</a></h1>
<a href="../man5/linkcheckerrc.5.html" class="Xr">linkcheckerrc(5)</a>
</section>
<section class="Sh">
<h1 class="Sh" id="AUTHOR"><a class="permalink" href="#AUTHOR">AUTHOR</a></h1>
Bastian Kleineidam &lt;bastian.kleineidam@web.de&gt;
</section>
<section class="Sh">
<h1 class="Sh" id="COPYRIGHT"><a class="permalink" href="#COPYRIGHT">COPYRIGHT</a></h1>
Copyright &#x00A9; 2000-2014 Bastian Kleineidam
</section>
</div>
<table class="foot">
<tr>
<td class="foot-date">2020-06-05</td>
<td class="foot-os">LinkChecker</td>
</tr>
</table>
</body>
</html>

View file

@ -1,655 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<style>
table.head, table.foot { width: 100%; }
td.head-rtitle, td.foot-os { text-align: right; }
td.head-vol { text-align: center; }
div.Pp { margin: 1ex 0ex; }
div.Nd, div.Bf, div.Op { display: inline; }
span.Pa, span.Ad { font-style: italic; }
span.Ms { font-weight: bold; }
dl.Bl-diag > dt { font-weight: bold; }
code.Nm, code.Fl, code.Cm, code.Ic, code.In, code.Fd, code.Fn,
code.Cd { font-weight: bold; font-family: inherit; }
</style>
<title>LINKCHECKERRC(5)</title>
</head>
<body>
<table class="head">
<tr>
<td class="head-ltitle">LINKCHECKERRC(5)</td>
<td class="head-vol">LinkChecker User Manual</td>
<td class="head-rtitle">LINKCHECKERRC(5)</td>
</tr>
</table>
<div class="manual-text">
<section class="Sh">
<h1 class="Sh" id="NAME"><a class="permalink" href="#NAME">NAME</a></h1>
linkcheckerrc - configuration file for LinkChecker
</section>
<section class="Sh">
<h1 class="Sh" id="DESCRIPTION"><a class="permalink" href="#DESCRIPTION">DESCRIPTION</a></h1>
<b>linkcheckerrc</b> is the configuration file for LinkChecker. The file is
written in an INI-style format.
<br/>
The default file location is <b>~/.linkchecker/linkcheckerrc</b> on Unix,
<b>%HOMEPATH%\.linkchecker\linkcheckerrc</b> on Windows systems.
</section>
<section class="Sh">
<h1 class="Sh" id="SETTINGS"><a class="permalink" href="#SETTINGS">SETTINGS</a></h1>
<section class="Ss">
<h2 class="Ss" id="_fB_checking__fP"><a class="permalink" href="#_fB_checking__fP"><b>[checking]</b></a></h2>
<dl class="Bl-tag">
<dt><b>cookiefile=</b><i>filename</i></dt>
<dd>Read a file with initial cookie data. The cookie data format is explained
in <a href="../man1/linkchecker.1.html" class="Xr">linkchecker(1)</a>.
<br/>
Command line option: <b>--cookiefile</b></dd>
<dt><b>localwebroot=</b><i>STRING</i></dt>
<dd>When checking absolute URLs inside local files, the given root directory
is used as base URL.
<br/>
Note that the given directory must have URL syntax, so it must use a slash
to join directories instead of a backslash. And the given directory must
end with a slash.
<br/>
Command line option: none</dd>
<dt><b>nntpserver=</b><i>STRING</i></dt>
<dd>Specify an NNTP server for <b>news:</b> links. Default is the environment
variable <b>NNTP_SERVER</b>. If no host is given, only the syntax of the
link is checked.
<br/>
Command line option: <b>--nntp-server</b></dd>
<dt><b>recursionlevel=</b><i>NUMBER</i></dt>
<dd>Check recursively all links up to given depth. A negative depth will
enable infinite recursion. Default depth is infinite.
<br/>
Command line option: <b>--recursion-level</b></dd>
<dt><b>threads=</b><i>NUMBER</i></dt>
<dd>Generate no more than the given number of threads. Default number of
threads is 10. To disable threading specify a non-positive number.
<br/>
Command line option: <b>--threads</b></dd>
<dt><b>timeout=</b><i>NUMBER</i></dt>
<dd>Set the timeout for connection attempts in seconds. The default timeout is
60 seconds.
<br/>
Command line option: <b>--timeout</b></dd>
<dt><b>aborttimeout=</b><i>NUMBER</i></dt>
<dd>Time to wait for checks to finish after the user aborts the first time
(with Ctrl-C or the abort button). The default abort timeout is 300
seconds.
<br/>
Command line option: <b>--timeout</b></dd>
<dt><b>useragent=</b><i>STRING</i></dt>
<dd>Specify the User-Agent string to send to the HTTP server, for example
&quot;Mozilla/4.0&quot;. The default is &quot;LinkChecker/X.Y&quot; where
X.Y is the current version of LinkChecker.
<br/>
Command line option: <b>--user-agent</b></dd>
<dt><b>sslverify=</b>[<b>0</b>|<b>1</b>|<i>filename</i>]</dt>
<dd>If set to zero disables SSL certificate checking. If set to one (the
default) enables SSL certificate checking with the provided CA certificate
file. If a filename is specified, it will be used as the certificate file.
<br/>
Command line option: none</dd>
<dt><b>maxrunseconds=</b><i>NUMBER</i></dt>
<dd>Stop checking new URLs after the given number of seconds. Same as if the
user stops (by hitting Ctrl-C) after the given number of seconds.
<br/>
The default is not to stop until all URLs are checked.
<br/>
Command line option: none</dd>
<dt><b>maxnumurls=</b><i>NUMBER</i></dt>
<dd>Maximum number of URLs to check. New URLs will not be queued after the
given number of URLs is checked.
<br/>
The default is to queue and check all URLs.
<br/>
Command line option: none</dd>
<dt><b>maxrequestspersecond=</b><i>NUMBER</i></dt>
<dd>Limit the maximum number of requests per second to one host.</dd>
<dt><b>allowedschemes=</b><i>NAME</i>[<b>,</b><i>NAME</i>...]</dt>
<dd>Allowed URL schemes as comma-separated list.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_filtering__fP"><a class="permalink" href="#_fB_filtering__fP"><b>[filtering]</b></a></h2>
<dl class="Bl-tag">
<dt><b>ignore=</b><i>REGEX</i> (MULTILINE)</dt>
<dd>Only check syntax of URLs matching the given regular expressions.
<br/>
Command line option: <b>--ignore-url</b></dd>
<dt><b>ignorewarnings=</b><i>NAME</i>[<b>,</b><i>NAME</i>...]</dt>
<dd>Ignore the comma-separated list of warnings. See <b>WARNINGS</b> for the
list of supported warnings.
<br/>
Command line option: none</dd>
<dt><b>internlinks=</b><i>REGEX</i></dt>
<dd>Regular expression to add more URLs recognized as internal links. Default
is that URLs given on the command line are internal.
<br/>
Command line option: none</dd>
<dt><b>nofollow=</b><i>REGEX</i> (MULTILINE)</dt>
<dd>Check but do not recurse into URLs matching the given regular expressions.
<br/>
Command line option: <b>--no-follow-url</b></dd>
<dt><b>checkextern=</b>[<b>0</b>|<b>1</b>]</dt>
<dd>Check external links. Default is to check internal links only.
<br/>
Command line option: <b>--checkextern</b></dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_authentication__fP"><a class="permalink" href="#_fB_authentication__fP"><b>[authentication]</b></a></h2>
<dl class="Bl-tag">
<dt><b>entry=</b><i>REGEX</i> <i>USER</i> [<i>PASS</i>] (MULTILINE)</dt>
<dd>Provide individual username/password pairs for different links. In addtion
to a single login page specified with <b>loginurl</b> multiple FTP, HTTP
(Basic Authentication) and telnet links are supported. Entries are a
triple (URL regex, username, password) or a tuple (URL regex, username),
where the entries are separated by whitespace.
<br/>
The password is optional and if missing it has to be entered at the
commandline.
<br/>
If the regular expression matches the checked URL, the given
username/password pair is used for authentication. The command line
options <b>-u</b> and <b>-p</b> match every link and therefore override
the entries given here. The first match wins.
<br/>
Command line option: <b>-u</b>, <b>-p</b></dd>
<dt><b>loginurl=</b><i>URL</i></dt>
<dd>The URL of a login page to be visited before link checking. The page is
expected to contain an HTML form to collect credentials and submit them to
the address in its action attribute using an HTTP POST request. The name
attributes of the input elements of the form and the values to be
submitted need to be available (see <b>entry</b> for an explanation of
username and password values).</dd>
<dt><b>loginuserfield=</b><i>STRING</i></dt>
<dd>The name attribute of the username input element. Default:
<b>login</b>.</dd>
<dt><b>loginpasswordfield=</b><i>STRING</i></dt>
<dd>The name attribute of the password input element. Default:
<b>password</b>.</dd>
<dt><b>loginextrafields=</b><i>NAME</i><b>:</b><i>VALUE</i> (MULTILINE)</dt>
<dd>Optionally the name attributes of any additional input elements and the
values to populate them with. Note that these are submitted without
checking whether matching input elements exist in the HTML form.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_output__fP"><a class="permalink" href="#_fB_output__fP"><b>[output]</b></a></h2>
<dl class="Bl-tag">
<dt><b>debug=</b><i>STRING</i>[<b>,</b><i>STRING</i>...]</dt>
<dd>Print debugging output for the given modules. Available debug modules are
<b>cmdline</b>, <b>checking</b>, <b>cache</b>, <b>dns</b>, <b>thread</b>,
<b>plugins</b> and <b>all</b>. Specifying <b>all</b> is an alias for
specifying all available loggers.
<br/>
Command line option: <b>--debug</b></dd>
<dt><b>fileoutput=</b><i>TYPE</i>[<b>,</b><i>TYPE</i>...]</dt>
<dd>Output to a files <b>linkchecker-out.</b><i>TYPE</i>,
<b>$HOME/.linkchecker/blacklist</b> for <b>blacklist</b> output.
<br/>
Valid file output types are <b>text</b>, <b>html</b>, <b>sql</b>,
<b>csv</b>, <b>gml</b>, <b>dot</b>, <b>xml</b>, <b>none</b> or
<b>blacklist</b> Default is no file output. The various output types are
documented below. Note that you can suppress all console output with
<b>output=none</b>.
<br/>
Command line option: <b>--file-output</b></dd>
<dt><b>log=</b><i>TYPE</i>[<b>/</b><i>ENCODING</i>]</dt>
<dd>Specify output type as <b>text</b>, <b>html</b>, <b>sql</b>, <b>csv</b>,
<b>gml</b>, <b>dot</b>, <b>xml</b>, <b>none</b> or <b>blacklist</b>.
Default type is <b>text</b>. The various output types are documented
below.
<br/>
The <i>ENCODING</i> specifies the output encoding, the default is that of
your locale. Valid encodings are listed at
<a class="Lk" href="https://docs.python.org/library/codecs.html#standard-encodings">https://docs.python.org/library/codecs.html#standard-encodings</a>.
<br/>
Command line option: <b>--output</b></dd>
<dt><b>quiet=</b>[<b>0</b>|<b>1</b>]</dt>
<dd>If set, operate quiet. An alias for <b>log=none</b>. This is only useful
with <b>fileoutput</b>.
<br/>
Command line option: <b>--verbose</b></dd>
<dt><b>status=</b>[<b>0</b>|<b>1</b>]</dt>
<dd>Control printing check status messages. Default is 1.
<br/>
Command line option: <b>--no-status</b></dd>
<dt><b>verbose=</b>[<b>0</b>|<b>1</b>]</dt>
<dd>If set log all checked URLs once. Default is to log only errors and
warnings.
<br/>
Command line option: <b>--verbose</b></dd>
<dt><b>warnings=</b>[<b>0</b>|<b>1</b>]</dt>
<dd>If set log warnings. Default is to log warnings.
<br/>
Command line option: <b>--no-warnings</b></dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_text__fP"><a class="permalink" href="#_fB_text__fP"><b>[text]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>Specify output filename for text logging. Default filename is
<b>linkchecker-out.txt</b>.
<br/>
Command line option: <b>--file-output=</b></dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>Comma-separated list of parts that have to be logged. See <b>LOGGER
PARTS</b> below.
<br/>
Command line option: none</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>Valid encodings are listed in
<a class="Lk" href="https://docs.python.org/library/codecs.html#standard-encodings">https://docs.python.org/library/codecs.html#standard-encodings</a>.
<br/>
Default encoding is <b>iso-8859-15</b>.</dd>
<dt><i>color*</i></dt>
<dd>Color settings for the various log parts, syntax is <i>color</i> or
<i>type</i><b>;</b><i>color</i>. The <i>type</i> can be <b>bold</b>,
<b>light</b>, <b>blink</b>, <b>invert</b>. The <i>color</i> can be
<b>default</b>, <b>black</b>, <b>red</b>, <b>green</b>, <b>yellow</b>,
<b>blue</b>, <b>purple</b>, <b>cyan</b>, <b>white</b>, <b>Black</b>,
<b>Red</b>, <b>Green</b>, <b>Yellow</b>, <b>Blue</b>, <b>Purple</b>,
<b>Cyan</b> or <b>White</b>.
<br/>
Command line option: none</dd>
<dt><b>colorparent=</b><i>STRING</i></dt>
<dd>Set parent color. Default is <b>white</b>.</dd>
<dt><b>colorurl=</b><i>STRING</i></dt>
<dd>Set URL color. Default is <b>default</b>.</dd>
<dt><b>colorname=</b><i>STRING</i></dt>
<dd>Set name color. Default is <b>default</b>.</dd>
<dt><b>colorreal=</b><i>STRING</i></dt>
<dd>Set real URL color. Default is <b>cyan</b>.</dd>
<dt><b>colorbase=</b><i>STRING</i></dt>
<dd>Set base URL color. Default is <b>purple</b>.</dd>
<dt><b>colorvalid=</b><i>STRING</i></dt>
<dd>Set valid color. Default is <b>bold;green</b>.</dd>
<dt><b>colorinvalid=</b><i>STRING</i></dt>
<dd>Set invalid color. Default is <b>bold;red</b>.</dd>
<dt><b>colorinfo=</b><i>STRING</i></dt>
<dd>Set info color. Default is <b>default</b>.</dd>
<dt><b>colorwarning=</b><i>STRING</i></dt>
<dd>Set warning color. Default is <b>bold;yellow</b>.</dd>
<dt><b>colordltime=</b><i>STRING</i></dt>
<dd>Set download time color. Default is <b>default</b>.</dd>
<dt><b>colorreset=</b><i>STRING</i></dt>
<dd>Set reset color. Default is <b>default</b>.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_gml__fP"><a class="permalink" href="#_fB_gml__fP"><b>[gml]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_dot__fP"><a class="permalink" href="#_fB_dot__fP"><b>[dot]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_csv__fP"><a class="permalink" href="#_fB_csv__fP"><b>[csv]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>separator=</b><i>CHAR</i></dt>
<dd>Set CSV separator. Default is a comma (<b>,</b>).</dd>
<dt><b>quotechar=</b><i>CHAR</i></dt>
<dd>Set CSV quote character. Default is a double quote (<b>&quot;</b>).</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_sql__fP"><a class="permalink" href="#_fB_sql__fP"><b>[sql]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>dbname=</b><i>STRING</i></dt>
<dd>Set database name to store into. Default is <b>linksdb</b>.</dd>
<dt><b>separator=</b><i>CHAR</i></dt>
<dd>Set SQL command separator character. Default is a semicolon
(<b>;</b>).</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_html__fP"><a class="permalink" href="#_fB_html__fP"><b>[html]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>colorbackground=</b><i>COLOR</i></dt>
<dd>Set HTML background color. Default is <b>#fff7e5</b>.</dd>
<dt><b>colorurl=</b></dt>
<dd>Set HTML URL color. Default is <b>#dcd5cf</b>.</dd>
<dt><b>colorborder=</b></dt>
<dd>Set HTML border color. Default is <b>#000000</b>.</dd>
<dt><b>colorlink=</b></dt>
<dd>Set HTML link color. Default is <b>#191c83</b>.</dd>
<dt><b>colorwarning=</b></dt>
<dd>Set HTML warning color. Default is <b>#e0954e</b>.</dd>
<dt><b>colorerror=</b></dt>
<dd>Set HTML error color. Default is <b>#db4930</b>.</dd>
<dt><b>colorok=</b></dt>
<dd>Set HTML valid color. Default is <b>#3ba557</b>.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_blacklist__fP"><a class="permalink" href="#_fB_blacklist__fP"><b>[blacklist]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_xml__fP"><a class="permalink" href="#_fB_xml__fP"><b>[xml]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_gxml__fP"><a class="permalink" href="#_fB_gxml__fP"><b>[gxml]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_sitemap__fP"><a class="permalink" href="#_fB_sitemap__fP"><b>[sitemap]</b></a></h2>
<dl class="Bl-tag">
<dt><b>filename=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>parts=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>encoding=</b><i>STRING</i></dt>
<dd>See [text] section above.</dd>
<dt><b>priority=</b><i>FLOAT</i></dt>
<dd>A number between 0.0 and 1.0 determining the priority. The default
priority for the first URL is 1.0, for all child URLs 0.5.</dd>
<dt><b>frequency=</b>[<b>always</b>|<b>hourly</b>|<b>daily</b>|<b>weekly</b>|<b>monthly</b>|<b>yearly</b>|<b>never</b>]</dt>
<dd>How frequently pages are changing.</dd>
</dl>
</section>
</section>
<section class="Sh">
<h1 class="Sh" id="LOGGER_PARTS"><a class="permalink" href="#LOGGER_PARTS">LOGGER
PARTS</a></h1>
<table class="tbl">
<tr>
<td><b>all</b></td>
<td>(for all parts)</td>
</tr>
<tr>
<td><b>id</b></td>
<td>(a unique ID for each logentry)</td>
</tr>
<tr>
<td><b>realurl</b></td>
<td>(the full url link)</td>
</tr>
<tr>
<td><b>result</b></td>
<td>(valid or invalid, with messages)</td>
</tr>
<tr>
<td><b>extern</b></td>
<td>(1 or 0, only in some logger types reported)</td>
</tr>
<tr>
<td><b>base</b></td>
<td>(base href=...)</td>
</tr>
<tr>
<td><b>name</b></td>
<td>(&lt;a href=...&gt;name&lt;/a&gt; and &lt;img
alt=&quot;name&quot;&gt;)</td>
</tr>
<tr>
<td><b>parenturl</b></td>
<td>(if any)</td>
</tr>
<tr>
<td><b>info</b></td>
<td>(some additional info, e.g. FTP welcome messages)</td>
</tr>
<tr>
<td><b>warning</b></td>
<td>(warnings)</td>
</tr>
<tr>
<td><b>dltime</b></td>
<td>(download time)</td>
</tr>
<tr>
<td><b>checktime</b></td>
<td>(check time)</td>
</tr>
<tr>
<td><b>url</b></td>
<td>(the original url name, can be relative)</td>
</tr>
<tr>
<td><b>intro</b></td>
<td>(the blurb at the beginning, &quot;starting at ...&quot;)</td>
</tr>
<tr>
<td><b>outro</b></td>
<td>(the blurb at the end, &quot;found x errors ...&quot;)</td>
</tr>
</table>
</section>
<section class="Sh">
<h1 class="Sh" id="MULTILINE"><a class="permalink" href="#MULTILINE">MULTILINE</a></h1>
Some option values can span multiple lines. Each line has to be indented for
that to work. Lines starting with a hash (<b>#</b>) will be ignored, though
they must still be indented.
<pre>
ignore=
lconline
bookmark
# a comment
^mailto:
</pre>
</section>
<section class="Sh">
<h1 class="Sh" id="EXAMPLE"><a class="permalink" href="#EXAMPLE">EXAMPLE</a></h1>
<pre>
[output]
log=html
</pre>
<pre>
[checking]
threads=5
</pre>
<pre>
[filtering]
ignorewarnings=http-moved-permanent
</pre>
</section>
<section class="Sh">
<h1 class="Sh" id="PLUGINS"><a class="permalink" href="#PLUGINS">PLUGINS</a></h1>
All plugins have a separate section. If the section appears in the configuration
file the plugin is enabled. Some plugins read extra options in their section.
<p class="Pp"></p>
<section class="Ss">
<h2 class="Ss" id="_fB_AnchorCheck__fP"><a class="permalink" href="#_fB_AnchorCheck__fP"><b>[AnchorCheck]</b></a></h2>
Checks validity of HTML anchors.
<p class="Pp"></p>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_LocationInfo__fP"><a class="permalink" href="#_fB_LocationInfo__fP"><b>[LocationInfo]</b></a></h2>
Adds the country and if possible city name of the URL host as info. Needs GeoIP
or pygeoip and a local country or city lookup DB installed.
<p class="Pp"></p>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_RegexCheck__fP"><a class="permalink" href="#_fB_RegexCheck__fP"><b>[RegexCheck]</b></a></h2>
Define a regular expression which prints a warning if it matches any content of
the checked link. This applies only to valid pages, so we can get their
content.
<dl class="Bl-tag">
<dt><b>warningregex=</b><i>REGEX</i></dt>
<dd>Use this to check for pages that contain some form of error message, for
example &quot;This page has moved&quot; or &quot;Oracle Application
error&quot;. <i>REGEX</i> should be unquoted.
<p class="Pp">Note that multiple values can be combined in the regular
expression, for example &quot;(This page has moved|Oracle Application
error)&quot;.</p>
<p class="Pp"></p>
</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_SslCertificateCheck__fP"><a class="permalink" href="#_fB_SslCertificateCheck__fP"><b>[SslCertificateCheck]</b></a></h2>
Check SSL certificate expiration date. Only internal https: links will be
checked. A domain will only be checked once to avoid duplicate warnings.
<dl class="Bl-tag">
<dt><b>sslcertwarndays=</b><i>NUMBER</i></dt>
<dd>Configures the expiration warning time in days.
<p class="Pp"></p>
</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_HtmlSyntaxCheck__fP"><a class="permalink" href="#_fB_HtmlSyntaxCheck__fP"><b>[HtmlSyntaxCheck]</b></a></h2>
Check the syntax of HTML pages with the online W3C HTML validator. See
<a class="Lk" href="https://validator.w3.org/docs/api.html">https://validator.w3.org/docs/api.html</a>.
<p class="Pp"></p>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_HttpHeaderInfo__fP"><a class="permalink" href="#_fB_HttpHeaderInfo__fP"><b>[HttpHeaderInfo]</b></a></h2>
Print HTTP headers in URL info.
<dl class="Bl-tag">
<dt><b>prefixes=</b><i>prefix1</i>[,<i>prefix2</i>]...</dt>
<dd>List of comma separated header prefixes. For example to display all HTTP
headers that start with &quot;X-&quot;.
<p class="Pp"></p>
</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_CssSyntaxCheck__fP"><a class="permalink" href="#_fB_CssSyntaxCheck__fP"><b>[CssSyntaxCheck]</b></a></h2>
Check the syntax of HTML pages with the online W3C CSS validator. See
<a class="Lk" href="https://jigsaw.w3.org/css-validator/manual.html#expert">https://jigsaw.w3.org/css-validator/manual.html#expert</a>.
<p class="Pp"></p>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_VirusCheck__fP"><a class="permalink" href="#_fB_VirusCheck__fP"><b>[VirusCheck]</b></a></h2>
Checks the page content for virus infections with clamav. A local clamav daemon
must be installed.
<dl class="Bl-tag">
<dt><b>clamavconf=</b><i>filename</i></dt>
<dd>Filename of <b>clamd.conf</b> config file.</dd>
</dl>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_PdfParser__fP"><a class="permalink" href="#_fB_PdfParser__fP"><b>[PdfParser]</b></a></h2>
Parse PDF files for URLs to check. Needs the <b>pdfminer</b> Python package
installed.
<p class="Pp"></p>
</section>
<section class="Ss">
<h2 class="Ss" id="_fB_WordParser__fP"><a class="permalink" href="#_fB_WordParser__fP"><b>[WordParser]</b></a></h2>
Parse Word files for URLs to check. Needs the <b>pywin32</b> Python extension
installed.
<p class="Pp"></p>
</section>
</section>
<section class="Sh">
<h1 class="Sh" id="WARNINGS"><a class="permalink" href="#WARNINGS">WARNINGS</a></h1>
The following warnings are recognized in the 'ignorewarnings' config file entry:
<br/>
<dl class="Bl-tag">
<dt><b>file-missing-slash</b></dt>
<dd>The file: URL is missing a trailing slash.</dd>
<dt><b>file-system-path</b></dt>
<dd>The file: path is not the same as the system specific path.</dd>
<dt><b>ftp-missing-slash</b></dt>
<dd>The ftp: URL is missing a trailing slash.</dd>
<dt><b>http-cookie-store-error</b></dt>
<dd>An error occurred while storing a cookie.</dd>
<dt><b>http-empty-content</b></dt>
<dd>The URL had no content.</dd>
<dt><b>mail-no-mx-host</b></dt>
<dd>The mail MX host could not be found.</dd>
<dt><b>nntp-no-newsgroup</b></dt>
<dd>The NNTP newsgroup could not be found.</dd>
<dt><b>nntp-no-server</b></dt>
<dd>No NNTP server was found.</dd>
<dt><b>url-content-size-zero</b></dt>
<dd>The URL content size is zero.</dd>
<dt><b>url-content-too-large</b></dt>
<dd>The URL content size is too large.</dd>
<dt><b>url-effective-url</b></dt>
<dd>The effective URL is different from the original.</dd>
<dt><b>url-error-getting-content</b></dt>
<dd>Could not get the content of the URL.</dd>
<dt><b>url-obfuscated-ip</b></dt>
<dd>The IP is obfuscated.</dd>
<dt><b>url-whitespace</b></dt>
<dd>The URL contains leading or trailing whitespace.
<p class="Pp"></p>
</dd>
</dl>
</section>
<section class="Sh">
<h1 class="Sh" id="SEE_ALSO"><a class="permalink" href="#SEE_ALSO">SEE
ALSO</a></h1>
<a href="../man1/linkchecker.1.html" class="Xr">linkchecker(1)</a>
</section>
<section class="Sh">
<h1 class="Sh" id="AUTHOR"><a class="permalink" href="#AUTHOR">AUTHOR</a></h1>
Bastian Kleineidam &lt;bastian.kleineidam@web.de&gt;
</section>
<section class="Sh">
<h1 class="Sh" id="COPYRIGHT"><a class="permalink" href="#COPYRIGHT">COPYRIGHT</a></h1>
Copyright &#x00A9; 2000-2014 Bastian Kleineidam
</section>
</div>
<table class="foot">
<tr>
<td class="foot-date">2020-06-05</td>
<td class="foot-os">LinkChecker</td>
</tr>
</table>
</body>
</html>

View file

@ -1,72 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="{{ page.encoding }}">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="description" content="{{ site.app.description }}">
<meta name="keywords" content="link,URL,validation,checking,crawling">
<meta name="author" content="{{ site.app.maintainer }}">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<title>{{ site.app.name }} - {{ page.title }}</title>
<link href='https://fonts.googleapis.com/css?family=Architects+Daughter' rel='stylesheet' type='text/css'>
<link rel="shortcut icon" href="{{ page.rooturl }}/favicon.ico" type="image/x-icon" />
<link rel="stylesheet" type="text/css" href="{{ page.rooturl }}/css/stylesheet.css" media="screen" />
<link rel="stylesheet" type="text/css" href="{{ page.rooturl }}/css/pygment_trac.css" media="screen" />
<link rel="stylesheet" type="text/css" href="{{ page.rooturl }}/css/print.css" media="print" />
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
{% block head %}{% endblock %}
</head>
<body>
<header>
<div class="inner">
<h1><a href="{{ page.rooturl }}/index.html">{{ site.app.name }}</a></h1>
<h2>Check websites for broken links</h2>
<a href="https://github.com/linkchecker/linkchecker"
class="button"><small>View project on</small> GitHub</a>
</div>
</header>
<div id="content-wrapper">
<div class="inner clearfix">
<section id="main-content">
{{ page.content }}
</section>
<aside id="sidebar">
<h2>Downloads</h2>
<p>
<a href="https://linkchecker.github.io/{{site.app.lname}}/dist/{{site.app.name}}-{{site.app.version}}.exe"
title="Download Windows .exe installer"
class="button"><span>{{site.app.name}}-{{site.app.version}}.exe</span></a>
<a href="http://ftp.debian.org/debian/pool/main/l/{{site.app.lname}}/"
title="Download Debian installer"
class="button"><span>{{site.app.name}}-{{site.app.version}}.deb</span></a>
<a href="https://pypi.python.org/packages/source/L/{{site.app.name}}/{{site.app.name}}-{{site.app.version}}.tar.gz"
title="Download Source .tar.gz package"
class="button"><span>{{site.app.name}}-{{site.app.version}}.tar.gz</span></a>
</p>
<p><a href="https://github.com/linkchecker/{{site.app.lname}}/blob/master/doc/changelog.txt">Changelog</a></p>
<p class="repo-owner"><a href="https://github.com/linkchecker/{{site.app.lname}}">{{site.app.name}}</a> is
maintained
by <a href="https://github.com/linkchecker">{{site.app.maintainer}}</a>.</p>
<h2>Support</h2>
<p>
<a href="https://github.com/linkchecker/linkchecker/issues?state=open">Issue&nbsp;tracker</a><br/>
</p>
<h2>Documentation</h2>
<p>
<a href="faq.html">FAQ</a><br/>
<a href="man1/linkchecker.1.html">Manual page</a>
</p>
</aside>
</div>
</div>
</body>
</html>

View file

@ -1,5 +0,0 @@
app: !include app.yaml
output_dir: ~/public_html/linkchecker-webpage.git
output_exclude: ["todo", "dist"]
keywords: "link,URL,validation,crawler"
extra_plugins_markdown: ["tables"]

View file

@ -38,6 +38,7 @@ per-file-ignores =
# F401: module imported but unused
linkchecker: E402
setup.py: E402
doc/src/conf.py: E402,F821
linkcheck/__init__.py: E402,F401
linkcheck/checker/httpurl.py: E402
linkcheck/htmlutil/htmlsoup.py: E402

View file

@ -326,10 +326,10 @@ for (src, dst) in list_message_files(AppName):
data_files.append((dst, [src]))
if os.name == "posix":
data_files.append(("share/man/man1", ["doc/en/linkchecker.1"]))
data_files.append(("share/man/man5", ["doc/en/linkcheckerrc.5"]))
data_files.append(("share/man/de/man1", ["doc/de/linkchecker.1"]))
data_files.append(("share/man/de/man5", ["doc/de/linkcheckerrc.5"]))
data_files.append(("share/man/man1", ["doc/man/en/linkchecker.1"]))
data_files.append(("share/man/man5", ["doc/man/en/linkcheckerrc.5"]))
data_files.append(("share/man/de/man1", ["doc/man/de/linkchecker.1"]))
data_files.append(("share/man/de/man5", ["doc/man/de/linkcheckerrc.5"]))
data_files.append(
(
"share/linkchecker/examples",