moved to doc/en

git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@2057 e7d03fd6-7b0d-0410-9947-9c21f3af8025
This commit is contained in:
calvin 2004-11-25 13:10:20 +00:00
parent 6087fb9127
commit ccfcf6b71f

View file

@ -1,290 +0,0 @@
.TH LINKCHECKER 1 "10 March 2001"
.SH NAME
linkchecker \- check your HTML documents for broken links
.SH SYNOPSIS
.B linkchecker
[
.I options
]
[
.I file-or-url
]
.SH DESCRIPTION
.LP
LinkChecker features
recursive checking,
multithreading,
output in colored or normal text, HTML, SQL, CSV or a sitemap
graph in GML or XML,
support for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:,
Gopher, Telnet and local file links,
restriction of link checking with regular expression filters for URLs,
proxy support,
username/password authorization for HTTP and FTP,
robots.txt exclusion protocol support,
i18n support,
a command line interface and
a (Fast)CGI web interface (requires HTTP server)
.SH EXAMPLES
The most common use checks the given domain recursively, plus any
single URL pointing outside of the domain:
\fBlinkchecker http://treasure.calvinsplayground.de/\fP
Beware that this checks the whole site which can have several hundred
thousands URLs. Use the -r option to restrict the recursion depth.
Don't connect to mailto: hosts, only check their URL syntax. All other
links are checked as usual:
\fBlinkchecker --intern='!^mailto:' --extern-strict-all www.mysite.org\fP
Checking a local HTML file on Unix:
\fBlinkchecker ../bla.html\fP
Checking a local HTML file on Windows:
\fBlinkchecker c:\\temp\\test.html\fP
You can skip the \fBhttp://\fP url part if the domain starts with \fBwww.\fP:
\fBlinkchecker www.myhomepage.de\fP
You can skip the \fBftp://\fP url part if the domain starts with \fBftp.\fP:
\fBlinkchecker -r0 ftp.linux.org\fP
.SH OPTIONS
.SS General options
.TP
\fB-h\fP, \fB--help\fP
Help me! Print usage information for this program.
.TP
\fB-f\fP\fIconfigfile\fP, \fB--config=\fP\fIconfigfile\fP
Use \fIfile\fP as configuration file. As default LinkChecker first searches
/etc/linkchecker/linkcheckerrc and then ~/.linkcheckerrc.
.TP
\fB-I\fP, \fB--interactive\fP
Ask for url if none are given on the commandline.
.TP
\fB-V\fP, \fB--version\fP
Print version and exit.
.TP
\fB-t\fP\fInum\fP, \fB--threads=\fP\fInum\fP
Generate no more than \fInum\fP threads. Default number of threads is 10.
To disable threading specify a non-positive number.
.SS Output options
.TP
\fB-v\fP, \fB--verbose\fP
Log all checked URLs (implies \fB-w\fP). Default is to log only invalid
URLs.
.TP
\fB-w\fP, \fB--warnings\fP
Log warnings.
.TP
\fB-W\fP\fIregex\fP, \fB--warning-regex=\fIregex\fP
Define a regular expression which prints a warning if it matches any
content of the checked link.
This applies of course only to pages which are valid, so we can get
their content.
Use this to check for pages that contain some form of error, for example
'This page has moved' or 'Oracle Application Server error'.
This option implies \fB-w\fP.
.TP
\fB--warning-size-bytes=\fP\fIbytes\fP
Print a warning if content size is available and exceeds the given
number of \fIbytes\fP.
This option implies \fB-w\fP.
.TP
\fB-q\fP, \fB--quiet\fP
Quiet operation, an alias for \fB-o none\fP.
This is only useful with \fB-F\fP.
.TP
\fB-o\fP\fItype\fP, \fB--output=\fP\fItype\fP[\fB/\fP\fIencoding\fP]
Specify output type as \fBtext\fP, \fBhtml\fP, \fBsql\fP,
\fBcsv\fP, \fBgml\fP, \fBxml\fP, \fBnone\fP or \fBblacklist\fP.
Default type is \fBtext\fP. The various output types are documented
below.
\fIencoding\fP specifies the output encoding, the default is
\fBiso-8859-15\fP.
Valid encodings are listed at
\fBhttp://docs.python.org/lib/node127.html\fP.
.TP
\fB-F\fP\fItype\fP[\fB/\fP\fIencoding\fP][\fB/\fP\fIfilename\fP], \fB--file-output=\fP\fItype\fP[\fB/\fP\fIencoding\fP][\fB/\fP\fIfilename\fP]
Output to a file \fBlinkchecker-out.\fP\fItype\fP,
\fB$HOME/.linkchecker_blacklist\fP for
\fBblacklist\fP output, or \fIfilename\fP if specified.
\fIencoding\fP specifies the output encoding, the default is
\fBiso-8859-15\fP.
Valid encodings are listed at
\fBhttp://docs.python.org/lib/node127.html\fP.
The \fIfilename\fP part of the \fBnone\fP output type will be ignored,
else if the file already exists, it will be overwritten.
You can specify this option more than once. Valid file output types
are \fBtext\fP, \fBhtml\fP, \fBsql\fP,
\fBcsv\fP, \fBgml\fP, \fBxml\fP, \fBnone\fP or \fBblacklist\fP
Default is no file output. The various output types are documented
below. Note that you can suppress all console output
with the option \fB-o none\fP.
.TP
\fB--no-status\fP
Do not print check status every 5 seconds to stderr. Does not work with the
\fB--debug\fP option.
.TP
\fB-D\fP, \fB--debug\fP
Print debugging information. Provide this option multiple times
for even more debugging information. Enabling debug will also
disable threading.
.TP
\fB--profile\fP
Write profiling data into a file named \fBlinkchecker.prof\fP
in the current working directory. See also \fB--viewprof\fP.
.TP
\fB--viewprof\fP
Print out previously generated profiling data. See also
\fB--profile\fP.
.SS Checking options
.TP
\fB-r\fP\fIdepth\fP, \fB--recursion-level=\fP\fIdepth\fP
Check recursively all links up to given \fIdepth\fP.
A negative depth will enable inifinite recursion.
Default depth is inifinite.
.TP
\fB-i\fP\fIregex\fP, \fB--intern=\fIregex\fP
Assume URLs that match the given regular expression as internal.
LinkChecker descends recursively only to internal URLs, not to external.
.TP
\fB-e\fP\fIregex\fP, \fB--extern=\fP\fIregex\fP
Assume urls that match the given regular expression as external.
Only internal HTML links are checked recursively.
.TP
\fB--extern-strict=\fP\fIregex\fP
Assume urls that match the given regular expression as strict external.
Only internal HTML links are checked recursively.
.TP
\fB-s\fP, \fB--extern-strict-all\fP
Check only the syntax of external links, do not try to connect to them.
For local file urls, only local files are internal. For
http and ftp urls, all urls at the same domain name are internal.
.TP
\fB-d\fP, \fB--denyallow\fP
Swap checking order to external/internal. Default checking order is
internal/external.
.TP
\fB-C\fP, \fB--cookies\fP
Accept and send HTTP cookies according to RFC 2109. Only cookies
which are sent back to the originating server are accepted.
Sent and accepted cookies are provided as additional logging
information.
.TP
\fB-a\fP, \fB--anchors\fP
Check HTTP anchor references. This option applies to both internal
and external urls. Default is don't check anchors.
This option implies -w because anchor errors are always warnings.
.TP
\fB--no-anchor-caching\fP
Treat url#anchora and url#anchorb as equal on caching. This
is the default browser behaviour, but it's not specified in
the URI specification. Use with care.
.TP
\fB-u\fP\fIname\fP, \fB--user=\fP\fIname\fP
Try username \fIname\fP for HTTP and FTP authorization.
For FTP the default username is \fBanonymous\fP. See also \fB-p\fP.
.TP
\fB-p\fP\fIpwd\fP, \fB--password=\fP\fIpwd\fP
Try the password \fIpwd\fP for HTTP and FTP authorization.
For FTP the default password is \fBanonymous@\fP. See also \fB-u\fP.
.TP
\fB--timeout=\fP\fIsecs\fP
Set the timeout for connection attempts in seconds. The default timeout
is 30 seconds.
.TP
\fB-P\fP\fIsecs\fP, \fB--pause=\fP\fIsecs\fP
Pause \fIsecs\fP seconds between each url check. This option
implies \fB-t0\fP.
Default is no pause between requests.
.TP
\fB-N\fP\fIserver\fP, \fB--nntp-server=\fP\fIserver\fP
Specify an NNTP server for 'news:...' links. Default is the
environment variable NNTP_SERVER. If no host is given,
only the syntax of the link is checked.
.SS Deprecated options
.TP
\fB--status\fP
Print check status every 5 seconds to stderr. This is the default now.
.SH OUTPUT TYPES
Note that by default only errors are logged.
.TP
\fBtext\fP
Standard text logger, logging URLs in keyword: argument fashion
.TP
\fBhtml\fP
Log URLs in keyword: argument fashion, formatted as HTML.
Additionally has links to the referenced pages. Invalid URLs have
HTML and CSS syntax check links appended.
.TP
\fBcsv\fP
Log check result in CSV format with one URL per line.
.TP
\fBgml\fP
Log parent-child relations between linked URLs as a GML graph.
You should use the \fB--verbose\fP option to get a complete graph.
.TP
\fBxml\fP
Log check result as machine-readable XML file.
.TP
\fBsql\fP
Log check result as SQL script with INSERT commands. An example
script to create the initial SQL table is included as create.sql.
.TP
\fBblacklist\fP
Suitable for cron jobs. Logs the check result into a file
\fB~/.blacklist\fP which only contains entries with invalid urls and
the number of times they have failed.
.TP
\fBnone\fP
Logs nothing. Suitable for scripts.
.SH NOTES
A \fB!\fP before any regex negates it. So \fB'!^mailto:'\fP matches
everything but a mailto link.
LinkCheckers commandline parser treats \fBftp.\fP links like \fBftp://ftp.\fP
and \fBwww.\fP links like \fBhttp://www.\fP.
You can also give local files as arguments.
If you have your system configured to automatically establish a
connection to the internet (e.g. with diald), it will connect when
checking links not pointing to your local host.
Use the -s and -i options to prevent this.
Javascript links are currently ignored.
If your platform does not support threading, LinkChecker uses
\fB-t0\fP.
You can supply multiple user/password pairs in a configuration file.
To use proxies set $http_proxy, $https_proxy on Unix or Windows.
On a Mac use the Internet Config.
When checking 'news:' links the given NNTP host doesn't need to be the
same as the host of the user browsing your pages!
.SH FILES
\fB/etc/linkchecker/linkcheckerrc\fP, \fB~/.linkcheckerrc\fP - default
configuration files
\fB~/.blacklist\fP - default blacklist logger output filename
\fBlinkchecker-out.\fP\fItype\fP - default logger file output name
\fBhttp://docs.python.org/lib/node127.html\fP - valid output encodings
.SH AUTHOR
Bastian Kleineidam <calvin@users.sourceforge.net>