mirror of
https://github.com/Hopiu/linkchecker.git
synced 2026-03-24 09:50:23 +00:00
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3887 e7d03fd6-7b0d-0410-9947-9c21f3af8025
414 lines
14 KiB
Groff
414 lines
14 KiB
Groff
.TH LINKCHECKER 1 2001-03-10 "LinkChecker" "LinkChecker commandline usage"
|
|
.SH NAME
|
|
linkchecker - check HTML documents and websites for broken links
|
|
.
|
|
.SH SYNOPSIS
|
|
\fBlinkchecker\fP [\fIoptions\fP] [\fIfile-or-url\fP]...
|
|
.
|
|
.SH DESCRIPTION
|
|
.LP
|
|
LinkChecker features
|
|
recursive checking,
|
|
multithreading,
|
|
output in colored or normal text, HTML, SQL, CSV or a sitemap
|
|
graph in GML or XML,
|
|
support for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet
|
|
and local file links,
|
|
restriction of link checking with regular expression filters for URLs,
|
|
proxy support,
|
|
username/password authorization for HTTP and FTP,
|
|
robots.txt exclusion protocol support,
|
|
i18n support,
|
|
a command line interface and
|
|
a (Fast)CGI web interface (requires HTTP server)
|
|
.
|
|
.SH EXAMPLES
|
|
The most common use checks the given domain recursively, plus any
|
|
URL pointing outside of the domain:
|
|
\fBlinkchecker http://treasure.calvinsplayground.de/\fP
|
|
.br
|
|
Beware that this checks the whole site which can have thousands of URLs.
|
|
Use the \fB\-r\fP option to restrict the recursion depth.
|
|
.br
|
|
Don't connect to \fBmailto:\fP hosts, only check their URL syntax. All other
|
|
links are checked as usual:
|
|
\fBlinkchecker \-\-ignore\-url=^mailto: www.mysite.org\fP
|
|
.br
|
|
Checking a local HTML file on Unix:
|
|
\fBlinkchecker ../bla.html\fP
|
|
.br
|
|
Checking from stdin:
|
|
\fBecho "bla.html" | linkchecker --stdin\fP
|
|
.br
|
|
Checking a local HTML file on Windows:
|
|
\fBlinkchecker c:\\temp\\test.html\fP
|
|
.br
|
|
You can skip the \fBhttp://\fP url part if the domain starts with \fBwww.\fP:
|
|
\fBlinkchecker www.myhomepage.de\fP
|
|
.br
|
|
You can skip the \fBftp://\fP url part if the domain starts with \fBftp.\fP:
|
|
\fBlinkchecker \-r0 ftp.linux.org\fP
|
|
.br
|
|
Generate a sitemap graph and convert it with the graphviz dot utility:
|
|
\fBlinkchecker \-odot \-v www.myhomepage.de | dot \-Tps > sitemap.ps\fP
|
|
.
|
|
.SH OPTIONS
|
|
.SS General options
|
|
.TP
|
|
\fB\-h\fP, \fB\-\-help\fP
|
|
Help me! Print usage information for this program.
|
|
.TP
|
|
\fB\-f\fP\fIFILENAME\fP, \fB\-\-config=\fP\fIFILENAME\fP
|
|
Use \fIFILENAME\fP as configuration file. As default LinkChecker first
|
|
searches \fB/etc/linkchecker/linkcheckerrc\fP and then
|
|
\fB~/.linkchecker/linkcheckerrc\fP.
|
|
.TP
|
|
\fB\-I\fP, \fB\-\-interactive\fP
|
|
Ask for URL if none are given on the commandline.
|
|
.TP
|
|
\fB\-t\fP\fINUMBER\fP, \fB\-\-threads=\fP\fINUMBER\fP
|
|
Generate no more than the given number of threads. Default number
|
|
of threads is 10. To disable threading specify a non-positive number.
|
|
.TP
|
|
\fB\-\-priority\fP
|
|
Run with normal thread scheduling priority. Per default LinkChecker runs
|
|
with low thread priority to be suitable as a background job.
|
|
.TP
|
|
\fB\-V\fP, \fB\-\-version\fP
|
|
Print version and exit.
|
|
.TP
|
|
\fB\-\-allow\-root\fP
|
|
Do not drop privileges when running as root user on Unix systems.
|
|
.TP
|
|
\fB\-\-stdin\fP
|
|
Read list of white-space separated URLs to check from stdin.
|
|
.
|
|
.SS Output options
|
|
.TP
|
|
\fB\-v\fP, \fB\-\-verbose\fP
|
|
Log all checked URLs once. Default is to log only errors and warnings.
|
|
.TP
|
|
\fB\-\-complete\fP
|
|
Log all URLs, including duplicates. Default is to log duplicate URLs only once.
|
|
.TP
|
|
\fB\-\-no\-warnings\fP
|
|
Don't log warnings. Default is to log warnings.
|
|
.TP
|
|
\fB\-W\fP\fIREGEX\fP, \fB\-\-warning\-regex=\fIREGEX\fP
|
|
Define a regular expression which prints a warning if it matches any
|
|
content of the checked link.
|
|
This applies only to valid pages, so we can get their content.
|
|
.br
|
|
Use this to check for pages that contain some form of error, for example
|
|
"This page has moved" or "Oracle Application Server error".
|
|
.TP
|
|
\fB\-\-warning\-size\-bytes=\fP\fINUMBER\fP
|
|
Print a warning if content size info is available and exceeds the given
|
|
number of \fIbytes\fP.
|
|
.TP
|
|
\fB\-\-check\-html\fP
|
|
Check syntax of HTML URLs with local library (HTML tidy).
|
|
.TP
|
|
\fB\-\-check\-html\-w3\fP
|
|
Check syntax of HTML URLs with W3C online validator.
|
|
.TP
|
|
\fB\-\-check\-css\fP
|
|
Check syntax of CSS URLs with local library (cssutils).
|
|
.TP
|
|
\fB\-\-check\-css\-w3\fP
|
|
Check syntax of CSS URLs with W3C online validator.
|
|
.TP
|
|
\fB\-\-scan\-virus\fP
|
|
Scan content of URLs for viruses with ClamAV.
|
|
.TP
|
|
\fB\-q\fP, \fB\-\-quiet\fP
|
|
Quiet operation, an alias for \fB\-o none\fP.
|
|
This is only useful with \fB\-F\fP.
|
|
.TP
|
|
\fB\-o\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP], \fB\-\-output=\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP]
|
|
Specify output type as \fBtext\fP, \fBhtml\fP, \fBsql\fP,
|
|
\fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBnone\fP or \fBblacklist\fP.
|
|
Default type is \fBtext\fP. The various output types are documented
|
|
below.
|
|
.br
|
|
The \fIENCODING\fP specifies the output encoding, the default is
|
|
that of your locale. Valid encodings are listed at
|
|
\fBhttp://docs.python.org/lib/standard\-encodings.html\fP.
|
|
.TP
|
|
\fB\-F\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP][\fB/\fP\fIFILENAME\fP], \fB\-\-file\-output=\fP\fITYPE\fP[\fB/\fP\fIENCODING\fP][\fB/\fP\fIFILENAME\fP]
|
|
Output to a file \fBlinkchecker\-out.\fP\fITYPE\fP,
|
|
\fB$HOME/.linkchecker/blacklist\fP for
|
|
\fBblacklist\fP output, or \fIFILENAME\fP if specified.
|
|
The \fIENCODING\fP specifies the output encoding, the default is
|
|
that of your locale.
|
|
Valid encodings are listed at
|
|
\fBhttp://docs.python.org/lib/standard\-encodings.html\fP.
|
|
The \fIFILENAME\fP and \fIENCODING\fP parts of the \fBnone\fP output type
|
|
will be ignored, else if the file already exists, it will be overwritten.
|
|
You can specify this option more than once. Valid file output types
|
|
are \fBtext\fP, \fBhtml\fP, \fBsql\fP,
|
|
\fBcsv\fP, \fBgml\fP, \fBdot\fP, \fBxml\fP, \fBnone\fP or \fBblacklist\fP
|
|
Default is no file output. The various output types are documented
|
|
below. Note that you can suppress all console output
|
|
with the option \fB\-o none\fP.
|
|
.TP
|
|
\fB\-\-no\-status\fP
|
|
Do not print check status messages.
|
|
.TP
|
|
\fB\-D\fP\fISTRING\fP, \fB\-\-debug=\fP\fISTRING\fP
|
|
Print debugging output for the given logger.
|
|
Available loggers are \fBcmdline\fP, \fBchecking\fP,
|
|
\fBcache\fP, \fBgui\fP, \fBdns\fP and \fBall\fP.
|
|
Specifying \fBall\fP is an alias for specifying all available loggers.
|
|
The option can be given multiple times to debug with more
|
|
than one logger.
|
|
.BR
|
|
For accurate results, threading will be disabled during debug runs.
|
|
.TP
|
|
\fB\-\-trace\fP
|
|
Print tracing information.
|
|
.TP
|
|
\fB\-\-profile\fP
|
|
Write profiling data into a file named \fBlinkchecker.prof\fP
|
|
in the current working directory. See also \fB\-\-viewprof\fP.
|
|
.TP
|
|
\fB\-\-viewprof\fP
|
|
Print out previously generated profiling data. See also
|
|
\fB\-\-profile\fP.
|
|
.
|
|
.SS Checking options
|
|
.TP
|
|
\fB\-r\fP\fINUMBER\fP, \fB\-\-recursion\-level=\fP\fINUMBER\fP
|
|
Check recursively all links up to given depth.
|
|
A negative depth will enable infinite recursion.
|
|
Default depth is infinite.
|
|
.TP
|
|
\fB\-\-no\-follow\-url=\fP\fIREGEX\fP
|
|
Check but do not recurse into URLs matching the given regular
|
|
expression.
|
|
.br
|
|
This option can be given multiple times.
|
|
.TP
|
|
\fB\-\-ignore\-url=\fP\fIREGEX\fP
|
|
Only check syntax of URLs matching the given regular expression.
|
|
.br
|
|
This option can be given multiple times.
|
|
.TP
|
|
\fB\-C\fP, \fB\-\-cookies\fP
|
|
Accept and send HTTP cookies according to RFC 2109. Only cookies
|
|
which are sent back to the originating server are accepted.
|
|
Sent and accepted cookies are provided as additional logging
|
|
information.
|
|
.TP
|
|
\fB\-\-cookiefile=\fP\fIFILENAME\fP
|
|
Read a file with initial cookie data. The cookie data
|
|
format is explained below.
|
|
.TP
|
|
\fB\-a\fP, \fB\-\-anchors\fP
|
|
Check HTTP anchor references. Default is not to check anchors.
|
|
This option enables logging of the warning \fBurl\-anchor\-not\-found\fP.
|
|
.TP
|
|
\fB\-\-no\-anchor\-caching\fP
|
|
Treat url#anchora and url#anchorb as equal on caching. This
|
|
is the default browser behaviour, but it's not specified in
|
|
the URI specification. Use with care since broken anchors are not
|
|
guaranteed to be detected in this mode.
|
|
.TP
|
|
\fB\-u\fP\fISTRING\fP, \fB\-\-user=\fP\fISTRING\fP
|
|
Try the given username for HTTP and FTP authorization.
|
|
For FTP the default username is \fBanonymous\fP. For HTTP there is
|
|
no default username. See also \fB\-p\fP.
|
|
.TP
|
|
\fB\-p\fP\fISTRING\fP, \fB\-\-password=\fP\fISTRING\fP
|
|
Try the given password for HTTP and FTP authorization.
|
|
For FTP the default password is \fBanonymous@\fP. For HTTP there is
|
|
no default password. See also \fB\-u\fP.
|
|
.TP
|
|
\fB\-\-timeout=\fP\fINUMBER\fP
|
|
Set the timeout for connection attempts in seconds. The default timeout
|
|
is 60 seconds.
|
|
.TP
|
|
\fB\-P\fP\fINUMBER\fP, \fB\-\-pause=\fP\fINUMBER\fP
|
|
Pause the given number of seconds between two subsequent connection
|
|
requests to the same host. Default is no pause between requests.
|
|
.TP
|
|
\fB\-N\fP\fISTRING\fP, \fB\-\-nntp\-server=\fP\fISTRING\fP
|
|
Specify an NNTP server for \fBnews:\fP links. Default is the
|
|
environment variable \fBNNTP_SERVER\fP. If no host is given,
|
|
only the syntax of the link is checked.
|
|
.TP
|
|
\fB\-\-no\-proxy\-for=\fP\fIREGEX\fP
|
|
Contact hosts that match the given regular expression directly instead of
|
|
going through a proxy.
|
|
.br
|
|
This option can be given multiple times.
|
|
|
|
.SH "CONFIGURATION FILES"
|
|
Configuration files can specify all options above. They can also
|
|
specify some options that cannot be set on the command line.
|
|
See \fBlinkcheckerrc\fP(5) for more info.
|
|
|
|
.SH OUTPUT TYPES
|
|
Note that by default only errors and warnings are logged.
|
|
You should use the \fB\-\-verbose\fP option to get the complete URL list,
|
|
especially when outputting a sitemap graph format.
|
|
|
|
.TP
|
|
\fBtext\fP
|
|
Standard text logger, logging URLs in keyword: argument fashion.
|
|
.TP
|
|
\fBhtml\fP
|
|
Log URLs in keyword: argument fashion, formatted as HTML.
|
|
Additionally has links to the referenced pages. Invalid URLs have
|
|
HTML and CSS syntax check links appended.
|
|
.TP
|
|
\fBcsv\fP
|
|
Log check result in CSV format with one URL per line.
|
|
.TP
|
|
\fBgml\fP
|
|
Log parent-child relations between linked URLs as a GML sitemap graph.
|
|
.TP
|
|
\fBdot\fP
|
|
Log parent-child relations between linked URLs as a DOT sitemap graph.
|
|
.TP
|
|
\fBgxml\fP
|
|
Log check result as a GraphXML sitemap graph.
|
|
.TP
|
|
\fBxml\fP
|
|
Log check result as machine-readable XML.
|
|
.TP
|
|
\fBsql\fP
|
|
Log check result as SQL script with INSERT commands. An example
|
|
script to create the initial SQL table is included as create.sql.
|
|
.TP
|
|
\fBblacklist\fP
|
|
Suitable for cron jobs. Logs the check result into a file
|
|
\fB~/.linkchecker/blacklist\fP which only contains entries with invalid
|
|
URLs and the number of times they have failed.
|
|
.TP
|
|
\fBnone\fP
|
|
Logs nothing. Suitable for debugging or checking the exit code.
|
|
.
|
|
.SH REGULAR EXPRESSIONS
|
|
Only Python regular expressions are accepted by LinkChecker.
|
|
See \fBhttp://www.amk.ca/python/howto/regex/\fP for an introduction in
|
|
regular expressions.
|
|
|
|
The only addition is that a leading exclamation mark negates the regular
|
|
expression.
|
|
.
|
|
.SH COOKIE FILES
|
|
A cookie file contains standard RFC 805 header data with the following
|
|
possible names:
|
|
.
|
|
.TP
|
|
\fBScheme\fP (optional)
|
|
Sets the scheme the cookies are valid for; default scheme is \fBhttp\fP.
|
|
.TP
|
|
\fBHost\fP (required)
|
|
Sets the domain the cookies are valid for.
|
|
.TP
|
|
\fBPath\fP (optional)
|
|
Gives the path the cookies are value for; default path is \fB/\fP.
|
|
.TP
|
|
\fBSet-cookie\fP (optional)
|
|
Set cookie name/value. Can be given more than once.
|
|
.PP
|
|
Multiple entries are separated by a blank line.
|
|
.
|
|
The example below will send two cookies to all URLs starting with
|
|
\fBhttp://example.com/hello/\fP and one to all URLs starting
|
|
with \fBhttps://example.org/\fP:
|
|
|
|
Host: example.com
|
|
Path: /hello
|
|
Set-cookie: ID="smee"
|
|
Set-cookie: spam="egg"
|
|
|
|
Scheme: https
|
|
Host: example.org
|
|
Set-cookie: baggage="elitist"; comment="hologram"
|
|
|
|
.SH PROXY SUPPORT
|
|
To use a proxy on Unix or Windows set $http_proxy, $https_proxy or $ftp_proxy
|
|
to the proxy URL. The URL should be of the form
|
|
\fBhttp://\fP[\fIuser\fP\fB:\fP\fIpass\fP\fB@\fP]\fIhost\fP[\fB:\fP\fIport\fP].
|
|
LinkChecker also detects manual proxy settings of Internet Explorer under
|
|
Windows systems. On a Mac use the Internet Config to select a proxy.
|
|
.
|
|
Setting a HTTP proxy on Unix for example looks like this:
|
|
|
|
export http_proxy="http://proxy.example.com:8080"
|
|
|
|
Proxy authentication is also supported:
|
|
|
|
export http_proxy="http://user1:mypass@proxy.example.org:8081"
|
|
|
|
Setting a proxy on the Windows command prompt:
|
|
|
|
set http_proxy=http://proxy.example.com:8080
|
|
|
|
.SH NOTES
|
|
URLs on the commandline starting with \fBftp.\fP are treated like
|
|
\fBftp://ftp.\fP, URLs starting with \fBwww.\fP are treated like
|
|
\fBhttp://www.\fP.
|
|
You can also give local files as arguments.
|
|
|
|
If you have your system configured to automatically establish a
|
|
connection to the internet (e.g. with diald), it will connect when
|
|
checking links not pointing to your local host.
|
|
Use the \fB\-s\fP and \fB\-i\fP options to prevent this.
|
|
|
|
Javascript links are currently ignored.
|
|
|
|
If your platform does not support threading, LinkChecker disables it
|
|
automatically.
|
|
|
|
You can supply multiple user/password pairs in a configuration file.
|
|
|
|
When checking \fBnews:\fP links the given NNTP host doesn't need to be the
|
|
same as the host of the user browsing your pages.
|
|
.
|
|
.SH ENVIRONMENT
|
|
\fBNNTP_SERVER\fP - specifies default NNTP server
|
|
.br
|
|
\fBhttp_proxy\fP - specifies default HTTP proxy server
|
|
.br
|
|
\fBftp_proxy\fP - specifies default FTP proxy server
|
|
.br
|
|
\fBLC_MESSAGES\fP, \fBLANG\fP, \fBLANGUAGE\fP - specify output language
|
|
.
|
|
.SH RETURN VALUE
|
|
The return value is non-zero when
|
|
.IP \(bu
|
|
invalid links were found or
|
|
.IP \(bu
|
|
link warnings were found and warnings are enabled
|
|
.IP \(bu
|
|
a program error occurred.
|
|
.
|
|
.SH LIMITATIONS
|
|
LinkChecker consumes memory for each queued URL to check. With thousands
|
|
of queued URLs the amount of consumed memory can become quite large. This
|
|
might slow down the program or even the whole system.
|
|
.
|
|
.SH FILES
|
|
\fB/etc/linkchecker/linkcheckerrc\fP, \fB~/.linkchecker/linkcheckerrc\fP - default
|
|
configuration files
|
|
.br
|
|
\fB~/.linkchecker/blacklist\fP - default blacklist logger output filename
|
|
.br
|
|
\fBlinkchecker\-out.\fP\fITYPE\fP - default logger file output name
|
|
.br
|
|
\fBhttp://docs.python.org/lib/standard\-encodings.html\fP - valid output encodings
|
|
.br
|
|
\fBhttp://www.amk.ca/python/howto/regex/\fP - regular expression documentation
|
|
|
|
.SH "SEE ALSO"
|
|
\fBlinkcheckerrc\fP(5)
|
|
.
|
|
.SH AUTHOR
|
|
Bastian Kleineidam <calvin@users.sourceforge.net>
|
|
.
|
|
.SH COPYRIGHT
|
|
Copyright \(co 2000-2009 Bastian Kleineidam
|