To check an URL like http://www.myhomepage.org/ it is enough to execute linkchecker http://www.myhomepage.org/. This will check the complete domain of www.myhomepage.org recursively. All links pointing outside of the domain are also checked for validity.
For more options, read the man page linkchecker(1) or execute linkchecker -h.
All URLs have to pass a preliminary syntax test. Minor quoting mistakes will issue a warning, all other invalid syntax issues are errors. After the syntax check passes, the URL is queued for connection checking. All connection check types are described below.
HTTP links (http:, https:)
After connecting to the given HTTP server the given path or query is requested. All redirections are followed, and if user/password is given it will be used as authorization when necessary. Permanently moved pages issue a warning. All final HTTP status codes other than 2xx are errors.
Local files (file:)
A regular, readable file that can be opened is valid. A readable directory is also valid. All other files, for example device files, unreadable or non-existing files are errors.
File contents are checked for recursion.
Mail links (mailto:)
A mailto: link eventually resolves to a list of email addresses. If one address fails, the whole list will fail. For each mail address we check the following things:
FTP links (ftp:)
For FTP links we do:
Gopher links (gopher:)
We try to send the given selector (or query) to the gopher server.
Telnet links (telnet:)
We try to connect and if user/password are given, login to the given telnet server.
NNTP links (news:, snews:, nntp)
We try to connect to the given NNTP server. If a news group or article is specified, try to request it from the server.
Ignored links (javascript:, etc.)
An ignored link will only print a warning. No further checking will be made.
Here is a complete list of recognized, but ignored links. The most prominent of them should be JavaScript links.
Recursion occurs on HTML files, Opera bookmark files and directories. Note that the directory recursion reads all files in that directory, not just a subset like index.htm*.