mirror of
https://github.com/Hopiu/linkchecker.git
synced 2026-03-20 07:50:24 +00:00
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@384 e7d03fd6-7b0d-0410-9947-9c21f3af8025
63 lines
2.8 KiB
Text
63 lines
2.8 KiB
Text
Q: LinkChecker produced an error, but my web page is ok with
|
|
Netscape/IE/Opera/...
|
|
Is this a bug in LinkChecker?
|
|
A: Please check your web pages first. Are they really ok? Use
|
|
a syntax highlighting editor! Use HTML Tidy from www.w3c.org!
|
|
Check if the web server is accepting HEAD requests as well.
|
|
|
|
|
|
Q: The link "mailto:john@company.com?subject=Hello John" is reported
|
|
as an error.
|
|
A: You have to quote special characters (e.g. spaces) in the subject field.
|
|
The correct link should be "mailto:...?subject=Hello%20John"
|
|
Unfortunately browsers like IE and Netscape do not enforce this.
|
|
|
|
|
|
Q: Has LinkChecker JavaScript support?
|
|
A: No. It never will. JavaScript sucks. If your page is not
|
|
working without JS, then your web design is broken (or you
|
|
cannot code ;). Learn PHP or Zope or ASP, but use JavaScript just
|
|
as an addon for your web pages.
|
|
|
|
|
|
Q: I have a pretty large site to check. How can I restrict link checking
|
|
to check only my own pages?
|
|
A: Look at the options --intern, --extern, --strict, --denyallow and
|
|
--recursion-level.
|
|
|
|
|
|
Q: I dont get this --extern/--intern stuff.
|
|
A: When it comes to checking there are three types of URLs:
|
|
1) strict URLs:
|
|
we do only syntax checking
|
|
2) extern URLs:
|
|
like 1), but we additionally check if they are valid by connect()ing
|
|
to them
|
|
3) intern URLs:
|
|
like 2), but we additionally check if they are HTML pages and if so,
|
|
we descend recursively into this link and check all the links in the
|
|
HTML content.
|
|
The --recursion-level option restricts the number of such recursive
|
|
descends.
|
|
|
|
LinkChecker provides four options which affect URLs to fall in one
|
|
of those three categories: --intern, --extern, --strict and
|
|
--denyallow.
|
|
By default all URLs are intern. With --extern you specify what URLs
|
|
are extern. With --intern you specify what URLs are intern.
|
|
Now imagine you have both --extern and --intern. What happens
|
|
when an URL matches both patterns? Or when it matches none? In this
|
|
situation the --denyallow option specifies the order in which we match
|
|
the URL. By default it is intern/extern, with --denyallow the order is
|
|
extern/intern. Either way, the first match counts, and if none matches,
|
|
the last checked category is the category for the URL.
|
|
Finally, with --strict all extern URLs are strict.
|
|
|
|
Oh, and just to boggle your mind: you can have more than one extern
|
|
regular expression in a config file and for each of those expressions
|
|
you can specify if those matched extern URLs should be strict or not.
|
|
|
|
An example. Assume we want to check only urls of our domains named
|
|
'mydomain.com' and 'myotherdomain.com'. Then we specify
|
|
-i'^http://my(other)?domain\.com' as intern regular expression, all other
|
|
urls are treated extern. Easy.
|