linkchecker/doc/documentation.txt
2004-08-27 20:46:29 +00:00

122 lines
4 KiB
Text

.. meta::
:navigation.order: 2
:navigation.name: Documentation
Documentation
=============
Basic usage
-----------
To check an URL like ``http://www.myhomepage.org/`` it is enough to
execute ``linkchecker http://www.myhomepage.org/``. This will check the
complete domain of www.myhomepage.org recursively. All links pointing
outside of the domain are also checked for validity.
For more options, read the man page ``linkchecker(1)`` or execute
``linkchecker -h``.
Performed checks
----------------
All URLs have to pass a preliminary syntax test. Minor quoting
mistakes will issue a warning, all other invalid syntax issues
are errors.
After the syntax check passes, the URL is queued for connection
checking. All connection check types are described below.
- HTTP links (``http:``, ``https:``)
- Local files (``file:``)
A regular, readable file that can be opened is valid. A readable
directory is also valid. All other files, for example device files,
unreadable or non-existing files are errors.
File contents are checked for recursion.
- Mail links (``mailto:``)
A mailto: link eventually resolves to a list of email addresses.
If one address fails, the whole list will fail.
For each mail address we check the following things:
1) Look up the MX DNS records. If we found no MX record,
print an error.
2) Check if one of the mail hosts accept an SMTP connection.
Check hosts with higher priority first.
If no host accepts SMTP, we print a warning.
3) Try to verify the address with the VRFY command. If we got
an answer, print the verified address as an info.
- FTP links (``ftp:``)
For FTP links we do:
1) connect to the specified host
2) try to login with the given user and password. The default
user is ``anonymous``, the default password is ``anonymous@``.
3) try to change to the given directory
4) list the file with the NLST command
- Gopher links (``gopher:``)
Try to send the given selector (or query) to the gopher server.
- Telnet links (``telnet:``)
We try to connect and, if user/password are given, login to the
given telnet server.
- NNTP links (``news:``, ``snews:``, ``nntp``)
- Ignored links (``javascript:``, etc.)
An ignored link will only print a warning. No further checking
will be made.
Here is a complete list of recognized, but ignored links. The most
prominent of them should be JavaScript links.
- ``acap:`` (application configuration access protocol)
- ``afs:`` (Andrew File System global file names)
- ``chrome:`` (Mozilla specific)
- ``cid:`` (content identifier)
- ``clsid:`` (Microsoft specific)
- ``data:`` (data)
- ``dav:`` (dav)
- ``fax:`` (fax)
- ``find:`` (Mozilla specific)
- ``imap:`` (internet message access protocol)
- ``isbn:`` (ISBN (int. book numbers))
- ``javascript:`` (JavaScript)
- ``ldap:`` (Lightweight Directory Access Protocol)
- ``mailserver:`` (Access to data available from mail servers)
- ``mid:`` (message identifier)
- ``mms:`` (multimedia stream)
- ``modem:`` (modem)
- ``nfs:`` (network file system protocol)
- ``opaquelocktoken:`` (opaquelocktoken)
- ``pop:`` (Post Office Protocol v3)
- ``prospero:`` (Prospero Directory Service)
- ``rsync:`` (rsync protocol)
- ``rtsp:`` (real time streaming protocol)
- ``service:`` (service location)
- ``shttp:`` (secure HTTP)
- ``sip:`` (session initiation protocol)
- ``tel:`` (telephone)
- ``tip:`` (Transaction Internet Protocol)
- ``tn3270:`` (Interactive 3270 emulation sessions)
- ``vemmi:`` (versatile multimedia interface)
- ``wais:`` (Wide Area Information Servers)
- ``z39.50r:`` (Z39.50 Retrieval)
- ``z39.50s:`` (Z39.50 Session)
Recursion
---------
Recursion occurs on HTML files, Opera bookmark files and directories.
Note that the directory recursion reads all files in that
directory, not just a subset like ``index.htm*``.