git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1420 e7d03fd6-7b0d-0410-9947-9c21f3af8025
This commit is contained in:
calvin 2004-08-16 19:17:36 +00:00
parent 288af2fb64
commit 9f7e3e67a9
39 changed files with 0 additions and 1558 deletions

152
FAQ
View file

@ -1,152 +0,0 @@
Q1: LinkChecker produced an error, but my web page is ok with
Netscape/IE/Opera/...
Is this a bug in LinkChecker?
A1: Please check your web pages first. Are they really ok? Use
a syntax highlighting editor! Use HTML Tidy from www.w3c.org!
Check if you are using a proxy which produces the error.
Q2.1: I still get an error, but the page is definitely ok.
A2: Some servers deny access of automated tools (also called robots)
like LinkChecker. This is not a bug in LinkChecker but rather a
policy by the webmaster running the website you are checking.
It might even be possible for a website to send robots different
web pages than normal browsers.
Q3: How can I tell LinkChecker which proxy to use?
A3: LinkChecker works transparently with proxies. In a Unix or Windows
environment, set the http_proxy, https_proxy, ftp_proxy or gopher_proxy
environment variables to a URL that identifies the proxy server before
starting LinkChecker. For example
# http_proxy="http://www.someproxy.com:3128"
# export http_proxy
In a Macintosh environment, LinkChecker will retrieve proxy information
from Internet Config.
Q4: The link "mailto:john@company.com?subject=Hello John" is reported
as an error.
A4: You have to quote special characters (e.g. spaces) in the subject field.
The correct link should be "mailto:...?subject=Hello%20John"
Unfortunately browsers like IE and Netscape do not enforce this.
Q5: Has LinkChecker JavaScript support?
A5: No, it never will. If your page is not working without JS then your
web design is broken.
Use PHP or Zope or ASP for dynamic content, and use JavaScript just as
an addon for your web pages.
Q6: I have a pretty large site to check. How can I restrict link checking
to check only my own pages?
A6: Look at the options --intern, --extern, --strict, --denyallow and
--recursion-level.
Q7: I don't get this --extern/--intern stuff.
A7: When it comes to checking there are three types of URLs:
1) strict external URLs:
We do only syntax checking. Internal URLs are never strict.
2) external URLs:
Like 1), but we additionally check if they are valid by connect()ing
to them
3) internal URLs:
Like 2), but we additionally check if they are HTML pages and if so,
we descend recursively into this link and check all the links in the
HTML content.
The --recursion-level option restricts the number of such recursive
descends.
LinkChecker provides four options which affect URLs to fall in one
of those three categories: --intern, --extern, --strict and
--denyallow.
By default all URLs are internal. With --extern you specify what URLs
are external. With --intern you specify what URLs are internal.
Now imagine you have both --extern and --intern. What happens
when an URL matches both patterns? Or when it matches none? In this
situation the --denyallow option specifies the order in which we match
the URL. By default it is internal/external, with --denyallow the order is
external/internal. Either way, the first match counts, and if none matches,
the last checked category is the category for the URL.
Finally, with --strict all external URLs are strict.
Oh, and just to boggle your mind: you can have more than one external
regular expression in a config file and for each of those expressions
you can specify if those matched external URLs should be strict or not.
An example. Assume we want to check only urls of our domains named
'mydomain.com' and 'myotherdomain.com'. Then we specify
-i'^http://my(other)?domain\.com' as internal regular expression, all other
urls are treated external. Easy.
Another example. We don't want to check mailto urls. Then its
-i'!^mailto:'. The '!' negates an expression. With --strict, we don't
even connect to any mail hosts.
Yet another example. We check our site www.mycompany.com, don't recurse
into external links point outside from our site and want to ignore links to
hollowood.com and hullabulla.com completely.
This can only be done with a configuration entry like
[filtering]
extern1=hollowood.com 1
extern2=hullabulla.com 1
# the 1 means strict external ie don't even connect
and the command
linkchecker --intern=www.mycompany.com www.mycompany.com
Q8: Is LinkCheckers cookie feature insecure?
A8: Cookies can not store more information as is in the HTTP request itself,
so you are not giving away any more system information.
After storing however, the cookies are sent out to the server on request.
Not to every server, but only to the one who the cookie originated from!
This could be used to "track" subsequent requests to this server,
and this is what some people annoys (including me).
Cookies are only stored in memory. After LinkChecker finishes, they
are lost. So the tracking is restricted to the checking time.
The cookie feature is disabled as default.
Q9: I want to have my own logging class. How can I use it in LinkChecker?
A9: Currently, only a Python API lets you define new logging classes.
Define your own logging class as a subclass of StandardLogger or any other
logging class in the log module.
Then call the addLogger function in Config.Configuration to register
your new Logger.
After this append a new Logging instance to the fileoutput.
import linkcheck, MyLogger
log_format = 'mylog'
log_args = {'fileoutput': log_format, 'filename': 'foo.txt'}
cfg = linkcheck.Config.Configuration()
cfg.addLogger(log_format, MyLogger.MyLogger)
cfg['fileoutput'].append(cfg.newLogger(log_format, log_args))
Q10.1: LinkChecker does not ignore anchor references on caching.
Q10.2: Some links with anchors are getting checked twice.
A10: This is not a bug.
It is common practice to believe that if an URL ABC#anchor1 works then
ABC#anchor2 works too. That is not specified anywhere and I have seen
server-side scripts that fail on some anchors and not on others.
This is the reason for always checking URLs with different anchors.
If you really want to disable this, use --no-anchor-caching.
Q11: I see LinkChecker gets a "/robots.txt" file for every site it
checks. What is that about?
A11: LinkChecker follows the robots.txt exclusion standard. To avoid
misuse of LinkChecker, you cannot turn this feature off.
See http://www.robotstxt.org/wc/robots.html and
http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt
for more info.
Q12: Ctrl-C does not stop LinkChecker immediately. Why is that so?
A12: The Python interpreter has to wait for all threads to finish, and
this means waiting for all open sockets to close. The default timeout
for sockets is 30 seconds, hence the delay.
You can change the default socket timeout with the --timeout option.

16
WONTDO
View file

@ -1,16 +0,0 @@
This is a list of things LinkChecker will *not* do for you.
1) Support JavaScript
See the FAQ, question Q5.
2) Print unreachable/dead documents of your website.
This would require
- file system access to your web repository
- access to your web server configuration
You can instead store the linkchecker results in a database
and look for missing files.
3) HTML/XML syntax checking
Use the HTML tidy program from http://tidy.sourceforge.net/ .

View file

@ -1,77 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import sys
import os
import linkcheck.logger.Logger
class BlacklistLogger (linkcheck.logger.Logger.Logger):
"""Updates a blacklist of wrong links. If a link on the blacklist
is working (again), it is removed from the list. So after n days
we have only links on the list which failed for n days.
"""
def __init__ (self, **args):
super(BlacklistLogger, self).__init__(**args)
self.errors = 0
self.blacklist = {}
if args.has_key('fileoutput'):
self.fileoutput = True
filename = args['filename']
if os.path.exists(filename):
self.readBlacklist(file(filename, "r"))
self.fd = file(filename, "w")
elif args.has_key('fd'):
self.fd = args['fd']
else:
self.fileoutput = False
self.fd = sys.stdout
def newUrl (self, urlData):
if not urlData.cached:
key = urlData.getCacheKey()
if key in self.blacklist:
if urlData.valid:
del self.blacklist[key]
else:
self.blacklist[key] += 1
else:
if not urlData.valid:
self.blacklist[key] = 1
def endOfOutput (self, linknumber=-1):
self.writeBlacklist()
def readBlacklist (self, fd):
for line in fd:
line = line.rstrip()
if line.startswith('#') or not line:
continue
value, key = line.split(None, 1)
self.blacklist[key] = int(value)
fd.close()
def writeBlacklist (self):
"""write the blacklist"""
oldmask = os.umask(0077)
for key, value in self.blacklist.items():
self.fd.write("%d %s\n" % (value, key))
if self.fileoutput:
self.fd.close()
# restore umask
os.umask(oldmask)

View file

@ -1,92 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import time
import csv
import bk.i18n
import linkcheck.logger.StandardLogger
import linkcheck.logger.Logger
class CSVLogger (linkcheck.logger.StandardLogger.StandardLogger):
""" CSV output. CSV consists of one line per entry. Entries are
separated by a semicolon.
"""
def __init__ (self, **args):
super(CSVLogger, self).__init__(**args)
self.separator = args['separator']
self.lineterminator = "\n"
def init (self):
linkcheck.logger.Logger.Logger.init(self)
if self.fd is None:
return
self.starttime = time.time()
if self.has_field("intro"):
self.fd.write("# "+(bk.i18n._("created by %s at %s%s") % (linkcheck.Config.AppName, bk.strtime.strtime(self.starttime), self.lineterminator)))
self.fd.write("# "+(bk.i18n._("Get the newest version at %s%s") % (linkcheck.Config.Url, self.lineterminator)))
self.fd.write("# "+(bk.i18n._("Write comments and bugs to %s%s%s") % \
(linkcheck.Config.Email, self.lineterminator, self.lineterminator)))
self.fd.write(
bk.i18n._("# Format of the entries:")+self.lineterminator+\
"# urlname;"+self.lineterminator+\
"# recursionlevel;"+self.lineterminator+\
"# parentname;"+self.lineterminator+\
"# baseref;"+self.lineterminator+\
"# errorstring;"+self.lineterminator+\
"# validstring;"+self.lineterminator+\
"# warningstring;"+self.lineterminator+\
"# infostring;"+self.lineterminator+\
"# valid;"+self.lineterminator+\
"# url;"+self.lineterminator+\
"# line;"+self.lineterminator+\
"# column;"+self.lineterminator+\
"# name;"+self.lineterminator+\
"# dltime;"+self.lineterminator+\
"# dlsize;"+self.lineterminator+\
"# checktime;"+self.lineterminator+\
"# cached;"+self.lineterminator)
self.flush()
self.writer = csv.writer(self.fd, dialect='excel', delimiter=self.separator, lineterminator=self.lineterminator)
def newUrl (self, urlData):
if self.fd is None:
return
row = [urlData.urlName, urlData.recursionLevel,
urlData.parentName or "", urlData.baseRef,
urlData.errorString, urlData.validString,
urlData.warningString, urlData.infoString,
urlData.valid, urlData.url,
urlData.line, urlData.column,
urlData.name, urlData.dltime,
urlData.dlsize, urlData.checktime,
urlData.cached]
self.writer.writerow(row)
self.flush()
def endOfOutput (self, linknumber=-1):
if self.fd is None:
return
self.stoptime = time.time()
if self.has_field("outro"):
duration = self.stoptime - self.starttime
self.fd.write("# "+bk.i18n._("Stopped checking at %s (%s)%s")%\
(bk.strtime.strtime(self.stoptime),
bk.strtime.strduration(duration), self.lineterminator))
self.flush()
self.fd.close()
self.fd = None

View file

@ -1,156 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import bk.i18n
import bk.ansicolor
import linkcheck.logger.StandardLogger
class ColoredLogger (linkcheck.logger.StandardLogger.StandardLogger):
"""ANSI colorized output"""
def __init__ (self, **args):
super(ColoredLogger, self).__init__(**args)
self.colorparent = bk.ansicolor.esc_ansicolor(args['colorparent'])
self.colorurl = bk.ansicolor.esc_ansicolor(args['colorurl'])
self.colorname = bk.ansicolor.esc_ansicolor(args['colorname'])
self.colorreal = bk.ansicolor.esc_ansicolor(args['colorreal'])
self.colorbase = bk.ansicolor.esc_ansicolor(args['colorbase'])
self.colorvalid = bk.ansicolor.esc_ansicolor(args['colorvalid'])
self.colorinvalid = bk.ansicolor.esc_ansicolor(args['colorinvalid'])
self.colorinfo = bk.ansicolor.esc_ansicolor(args['colorinfo'])
self.colorwarning = bk.ansicolor.esc_ansicolor(args['colorwarning'])
self.colordltime = bk.ansicolor.esc_ansicolor(args['colordltime'])
self.colordlsize = bk.ansicolor.esc_ansicolor(args['colordlsize'])
self.colorreset = bk.ansicolor.esc_ansicolor(args['colorreset'])
self.currentPage = None
self.prefix = 0
def newUrl (self, urlData):
if self.fd is None:
return
if self.has_field("parenturl"):
if urlData.parentName:
if self.currentPage != urlData.parentName:
if self.prefix:
self.fd.write("o\n")
self.fd.write("\n"+self.field("parenturl")+
self.spaces("parenturl")+
self.colorparent+
(urlData.parentName or "")+
self.colorreset+"\n")
self.currentPage = urlData.parentName
self.prefix = 1
else:
if self.prefix:
self.fd.write("o\n")
self.prefix = 0
self.currentPage=None
if self.has_field("url"):
if self.prefix:
self.fd.write("|\n+- ")
else:
self.fd.write("\n")
self.fd.write(self.field("url")+self.spaces("url")+self.colorurl+
urlData.urlName+self.colorreset)
if urlData.line:
self.fd.write(bk.i18n._(", line %d")%urlData.line)
if urlData.column:
self.fd.write(bk.i18n._(", col %d")%urlData.column)
if urlData.cached:
self.fd.write(bk.i18n._(" (cached)\n"))
else:
self.fd.write("\n")
if urlData.name and self.has_field("name"):
if self.prefix:
self.fd.write("| ")
self.fd.write(self.field("name")+self.spaces("name")+
self.colorname+urlData.name+self.colorreset+"\n")
if urlData.baseRef and self.has_field("base"):
if self.prefix:
self.fd.write("| ")
self.fd.write(self.field("base")+self.spaces("base")+
self.colorbase+urlData.baseRef+self.colorreset+"\n")
if urlData.url and self.has_field("realurl"):
if self.prefix:
self.fd.write("| ")
self.fd.write(self.field("realurl")+self.spaces("realurl")+
self.colorreal+urlData.url+
self.colorreset+"\n")
if urlData.dltime>=0 and self.has_field("dltime"):
if self.prefix:
self.fd.write("| ")
self.fd.write(self.field("dltime")+self.spaces("dltime")+
self.colordltime+
(bk.i18n._("%.3f seconds") % urlData.dltime)+
self.colorreset+"\n")
if urlData.dlsize>=0 and self.has_field("dlsize"):
if self.prefix:
self.fd.write("| ")
self.fd.write(self.field("dlsize")+self.spaces("dlsize")+
self.colordlsize+linkcheck.StringUtil.strsize(urlData.dlsize)+
self.colorreset+"\n")
if urlData.checktime and self.has_field("checktime"):
if self.prefix:
self.fd.write("| ")
self.fd.write(self.field("checktime")+self.spaces("checktime")+
self.colordltime+
(bk.i18n._("%.3f seconds") % urlData.checktime)+self.colorreset+"\n")
if urlData.infoString and self.has_field("info"):
if self.prefix:
self.fd.write("| "+self.field("info")+self.spaces("info")+
linkcheck.StringUtil.indentWith(linkcheck.StringUtil.blocktext(
urlData.infoString, 65), "| "+self.spaces("info")))
else:
self.fd.write(self.field("info")+self.spaces("info")+
linkcheck.StringUtil.indentWith(linkcheck.StringUtil.blocktext(
urlData.infoString, 65), " "+self.spaces("info")))
self.fd.write(self.colorreset+"\n")
if urlData.warningString:
#self.warnings += 1
if self.has_field("warning"):
if self.prefix:
self.fd.write("| ")
self.fd.write(self.field("warning")+self.spaces("warning")+
self.colorwarning+
urlData.warningString+self.colorreset+"\n")
if self.has_field("result"):
if self.prefix:
self.fd.write("| ")
self.fd.write(self.field("result")+self.spaces("result"))
if urlData.valid:
self.fd.write(self.colorvalid+urlData.validString+
self.colorreset+"\n")
else:
self.errors += 1
self.fd.write(self.colorinvalid+urlData.errorString+
self.colorreset+"\n")
self.flush()
def endOfOutput (self, linknumber=-1):
if self.fd is None:
return
if self.has_field("outro"):
if self.prefix:
self.fd.write("o\n")
super(ColoredLogger, self).endOfOutput(linknumber=linknumber)

View file

@ -1,99 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import time
import linkcheck.logger.StandardLogger
import bk.i18n
class GMLLogger (linkcheck.logger.StandardLogger.StandardLogger):
"""GML means Graph Modeling Language. Use a GML tool to see
your sitemap graph.
"""
def __init__ (self, **args):
super(GMLLogger, self).__init__(**args)
self.nodes = {}
self.nodeid = 0
def init (self):
linkcheck.logger.Logger.Logger.init(self)
if self.fd is None:
return
self.starttime = time.time()
if self.has_field("intro"):
self.fd.write("# "+(bk.i18n._("created by %s at %s\n") % (linkcheck.Config.AppName,
bk.strtime.strtime(self.starttime))))
self.fd.write("# "+(bk.i18n._("Get the newest version at %s\n") % linkcheck.Config.Url))
self.fd.write("# "+(bk.i18n._("Write comments and bugs to %s\n\n") % \
linkcheck.Config.Email))
self.fd.write("graph [\n directed 1\n")
self.flush()
def newUrl (self, urlData):
"""write one node and all possible edges"""
if self.fd is None:
return
node = urlData
if node.url and not self.nodes.has_key(node.url):
node.id = self.nodeid
self.nodes[node.url] = node
self.nodeid += 1
self.fd.write(" node [\n")
self.fd.write(" id %d\n" % node.id)
if self.has_field("realurl"):
self.fd.write(' label "%s"\n' % node.url)
if node.dltime>=0 and self.has_field("dltime"):
self.fd.write(" dltime %d\n" % node.dltime)
if node.dlsize>=0 and self.has_field("dlsize"):
self.fd.write(" dlsize %d\n" % node.dlsize)
if node.checktime and self.has_field("checktime"):
self.fd.write(" checktime %d\n" % node.checktime)
if self.has_field("extern"):
self.fd.write(" extern %d\n" % (node.extern and 1 or 0))
self.fd.write(" ]\n")
self.writeEdges()
def writeEdges (self):
"""write all edges we can find in the graph in a brute-force
manner. Better would be a mapping of parent urls.
"""
for node in self.nodes.values():
if self.nodes.has_key(node.parentName):
self.fd.write(" edge [\n")
self.fd.write(' label "%s"\n' % node.urlName)
if self.has_field("parenturl"):
self.fd.write(" source %d\n" % \
self.nodes[node.parentName].id)
self.fd.write(" target %d\n" % node.id)
if self.has_field("result"):
self.fd.write(" valid %d\n" % (node.valid and 1 or 0))
self.fd.write(" ]\n")
self.flush()
def endOfOutput (self, linknumber=-1):
if self.fd is None:
return
self.fd.write("]\n")
if self.has_field("outro"):
self.stoptime = time.time()
duration = self.stoptime - self.starttime
self.fd.write("# "+bk.i18n._("Stopped checking at %s (%s)\n")%\
(bk.strtime.strtime(self.stoptime),
bk.strtime.strduration(duration)))
self.flush()
self.fd = None

View file

@ -1,175 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import time
import linkcheck.logger.StandardLogger
import linkcheck.StringUtil
import linkcheck.Config
import bk.i18n
HTML_HEADER = """<!DOCTYPE html PUBLIC "-//W3C//DTD html 4.01//EN">
<html><head><title>%s</title>
<style type="text/css">\n<!--
h2 { font-family: Verdana,sans-serif; font-size: 22pt;
font-style: bold; font-weight: bold }
body { font-family: Arial,sans-serif; font-size: 11pt }
td { font-family: Arial,sans-serif; font-size: 11pt }
code { font-family: Courier }
a:hover { color: #34a4ef }
//-->
</style></head>
<body bgcolor="%s" link="%s" vlink="%s" alink="%s">
"""
class HtmlLogger (linkcheck.logger.StandardLogger.StandardLogger):
"""Logger with HTML output"""
def __init__ (self, **args):
super(HtmlLogger, self).__init__(**args)
self.colorbackground = args['colorbackground']
self.colorurl = args['colorurl']
self.colorborder = args['colorborder']
self.colorlink = args['colorlink']
self.tablewarning = args['tablewarning']
self.tableerror = args['tableerror']
self.tableok = args['tableok']
def init (self):
linkcheck.logger.Logger.Logger.init(self)
if self.fd is None:
return
self.starttime = time.time()
self.fd.write(HTML_HEADER%(linkcheck.Config.App, self.colorbackground,
self.colorlink, self.colorlink, self.colorlink))
if self.has_field('intro'):
self.fd.write("<center><h2>"+linkcheck.Config.App+"</h2></center>"+
"<br><blockquote>"+linkcheck.Config.Freeware+"<br><br>"+
(bk.i18n._("Start checking at %s\n") % \
bk.strtime.strtime(self.starttime))+
"<br>")
self.flush()
def newUrl (self, urlData):
if self.fd is None:
return
self.fd.write("<br clear=\"all\"><br>\n"+
"<table align=\"left\" border=\"0\" cellspacing=\"0\" cellpadding=\"1\"\n"+
" bgcolor=\""+self.colorborder+"\" summary=\"Border\">\n"+
"<tr>\n"+
"<td>\n"+
"<table align=\"left\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\"\n"+
" summary=\"checked link\" bgcolor=\""+self.colorbackground+"\">\n")
if self.has_field("url"):
self.fd.write("<tr>\n"+
"<td bgcolor=\""+self.colorurl+"\">"+self.field("url")+"</td>\n"+
"<td bgcolor=\""+self.colorurl+"\">"+urlData.urlName)
if urlData.cached:
self.fd.write(bk.i18n._(" (cached)"))
self.fd.write("</td>\n</tr>\n")
if urlData.name and self.has_field("name"):
self.fd.write("<tr>\n<td>"+self.field("name")+"</td>\n<td>"+
urlData.name+"</td>\n</tr>\n")
if urlData.parentName and self.has_field("parenturl"):
self.fd.write("<tr>\n<td>"+self.field("parenturl")+
'</td>\n<td><a target="top" href="'+
(urlData.parentName or "")+'">'+
(urlData.parentName or "")+"</a>")
if urlData.line:
self.fd.write(bk.i18n._(", line %d")%urlData.line)
if urlData.column:
self.fd.write(bk.i18n._(", col %d")%urlData.column)
self.fd.write("</td>\n</tr>\n")
if urlData.baseRef and self.has_field("base"):
self.fd.write("<tr>\n<td>"+self.field("base")+"</td>\n<td>"+
urlData.baseRef+"</td>\n</tr>\n")
if urlData.url and self.has_field("realurl"):
self.fd.write("<tr>\n<td>"+self.field("realurl")+"</td>\n<td>"+
'<a target="top" href="'+urlData.url+
'">'+urlData.url+"</a></td>\n</tr>\n")
if urlData.dltime>=0 and self.has_field("dltime"):
self.fd.write("<tr>\n<td>"+self.field("dltime")+"</td>\n<td>"+
(bk.i18n._("%.3f seconds") % urlData.dltime)+
"</td>\n</tr>\n")
if urlData.dlsize>=0 and self.has_field("dlsize"):
self.fd.write("<tr>\n<td>"+self.field("dlsize")+"</td>\n<td>"+
linkcheck.StringUtil.strsize(urlData.dlsize)+
"</td>\n</tr>\n")
if urlData.checktime and self.has_field("checktime"):
self.fd.write("<tr>\n<td>"+self.field("checktime")+
"</td>\n<td>"+
(bk.i18n._("%.3f seconds") % urlData.checktime)+
"</td>\n</tr>\n")
if urlData.infoString and self.has_field("info"):
self.fd.write("<tr>\n<td>"+self.field("info")+"</td>\n<td>"+
linkcheck.StringUtil.htmlify(urlData.infoString)+
"</td>\n</tr>\n")
if urlData.warningString:
#self.warnings += 1
if self.has_field("warning"):
self.fd.write("<tr>\n"+
self.tablewarning+self.field("warning")+
"</td>\n"+self.tablewarning+
urlData.warningString.replace("\n", "<br>")+
"</td>\n</tr>\n")
if self.has_field("result"):
if urlData.valid:
self.fd.write("<tr>\n"+self.tableok+
self.field("result")+"</td>\n"+
self.tableok+urlData.validString+"</td>\n</tr>\n")
else:
self.errors += 1
self.fd.write("<tr>\n"+self.tableerror+self.field("result")+
"</td>\n"+self.tableerror+
urlData.errorString+"</td>\n</tr>\n")
self.fd.write("</table></td></tr></table><br clear=\"all\">")
self.flush()
def endOfOutput (self, linknumber=-1):
if self.fd is None:
return
if self.has_field("outro"):
self.fd.write("\n"+bk.i18n._("Thats it. "))
#if self.warnings==1:
# self.fd.write(bk.i18n._("1 warning, "))
#else:
# self.fd.write(str(self.warnings)+bk.i18n._(" warnings, "))
if self.errors==1:
self.fd.write(bk.i18n._("1 error"))
else:
self.fd.write(str(self.errors)+bk.i18n._(" errors"))
if linknumber >= 0:
if linknumber == 1:
self.fd.write(bk.i18n._(" in 1 link"))
else:
self.fd.write(bk.i18n._(" in %d links") % linknumber)
self.fd.write(bk.i18n._(" found")+"\n<br>")
self.stoptime = time.time()
duration = self.stoptime - self.starttime
self.fd.write(bk.i18n._("Stopped checking at %s (%s)\n")%\
(bk.strtime.strtime(self.stoptime),
bk.strtime.strduration(duration)))
self.fd.write("</blockquote><br><hr noshade size=\"1\"><small>"+
linkcheck.Config.HtmlAppInfo+"<br>")
self.fd.write(bk.i18n._("Get the newest version at %s\n") %\
('<a href="'+linkcheck.Config.Url+'" target="_top">'+linkcheck.Config.Url+
"</a>.<br>"))
self.fd.write(bk.i18n._("Write comments and bugs to %s\n\n") %\
('<a href="mailto:'+linkcheck.Config.Email+'">'+linkcheck.Config.Email+"</a>."))
self.fd.write("</small></body></html>")
self.flush()
self.fd = None

View file

@ -1,81 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import bk.i18n
class Logger (object):
Fields = {
"realurl": bk.i18n._("Real URL"),
"result": bk.i18n._("Result"),
"base": bk.i18n._("Base"),
"name": bk.i18n._("Name"),
"parenturl": bk.i18n._("Parent URL"),
"extern": bk.i18n._("Extern"),
"info": bk.i18n._("Info"),
"warning": bk.i18n._("Warning"),
"dltime": bk.i18n._("D/L Time"),
"dlsize": bk.i18n._("D/L Size"),
"checktime": bk.i18n._("Check Time"),
"url": bk.i18n._("URL"),
}
def __init__ (self, **args):
self.logfields = None # log all fields
if args.has_key('fields'):
if "all" not in args['fields']:
self.logfields = args['fields']
def has_field (self, name):
if self.logfields is None:
# log all fields
return True
return name in self.logfields
def field (self, name):
"""return translated field name"""
# XXX i18nreal._(self.Fields[name])
return self.Fields[name]
def spaces (self, name):
return self.logspaces[name]
def init (self):
# map with spaces between field name and value
self.logspaces = {}
if self.logfields is None:
fields = self.Fields.keys()
else:
fields = self.logfields
values = [self.field(x) for x in fields]
# maximum indent for localized log field names
self.max_indent = max(map(lambda x: len(x), values))+1
for key in fields:
self.logspaces[key] = " "*(self.max_indent - len(self.field(key)))
def newUrl (self, urlData):
raise Exception, "abstract function"
def endOfOutput (self, linknumber=-1):
raise Exception, "abstract function"
def __str__ (self):
return self.__class__.__name__
def __repr__ (self):
return repr(self.__class__.__name__)

View file

@ -1,28 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import linkcheck.logger.Logger
class NoneLogger (linkcheck.logger.Logger.Logger):
"""Dummy logger printing nothing."""
def newUrl (self, urlData):
pass
def endOfOutput (self, linknumber=-1):
pass

View file

@ -1,101 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import time
import linkcheck
import bk.i18n
import linkcheck.logger.StandardLogger
import linkcheck.logger.Logger
def applyTable (table, s):
"apply a table of replacement pairs to str"
for mapping in table:
s = s.replace(mapping[0], mapping[1])
return s
SQLTable = [
("'","''")
]
def sqlify (s):
"Escape special SQL chars and strings"
if not s:
return "NULL"
return "'%s'"%applyTable(SQLTable, s)
class SQLLogger (linkcheck.logger.StandardLogger.StandardLogger):
""" SQL output for PostgreSQL, not tested"""
def __init__ (self, **args):
super(SQLLogger, self).__init__(**args)
self.dbname = args['dbname']
self.separator = args['separator']
def init (self):
linkcheck.logger.Logger.Logger.init(self)
if self.fd is None: return
self.starttime = time.time()
if self.has_field("intro"):
self.fd.write("-- "+(bk.i18n._("created by %s at %s\n") % (linkcheck.Config.AppName,
bk.strtime.strtime(self.starttime))))
self.fd.write("-- "+(bk.i18n._("Get the newest version at %s\n") % linkcheck.Config.Url))
self.fd.write("-- "+(bk.i18n._("Write comments and bugs to %s\n\n") % \
linkcheck.Config.Email))
self.flush()
def newUrl (self, urlData):
if self.fd is None: return
self.fd.write("insert into %s(urlname,recursionlevel,parentname,"
"baseref,errorstring,validstring,warningstring,infostring,"
"valid,url,line,col,name,checktime,dltime,dlsize,cached)"
" values "
"(%s,%d,%s,%s,%s,%s,%s,%s,%d,%s,%d,%d,%s,%d,%d,%d,%d)%s\n" % \
(self.dbname,
sqlify(urlData.urlName),
urlData.recursionLevel,
sqlify((urlData.parentName or "")),
sqlify(urlData.baseRef),
sqlify(urlData.errorString),
sqlify(urlData.validString),
sqlify(urlData.warningString),
sqlify(urlData.infoString),
urlData.valid,
sqlify(bk.url.url_quote(urlData.url)),
urlData.line,
urlData.column,
sqlify(urlData.name),
urlData.checktime,
urlData.dltime,
urlData.dlsize,
urlData.cached,
self.separator))
self.flush()
def endOfOutput (self, linknumber=-1):
if self.fd is None: return
if self.has_field("outro"):
self.stoptime = time.time()
duration = self.stoptime - self.starttime
self.fd.write("-- "+bk.i18n._("Stopped checking at %s (%s)\n")%\
(bk.strtime.strtime(self.stoptime),
bk.strtime.strduration(duration)))
self.flush()
self.fd = None

View file

@ -1,172 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import sys
import time
import bk.i18n
import linkcheck.logger.Logger
import linkcheck.StringUtil
import linkcheck.Config
class StandardLogger (linkcheck.logger.Logger.Logger):
"""Standard text logger.
Every Logger has to implement the following functions:
init(self)
Called once to initialize the Logger. Why do we not use __init__(self)?
Because we initialize the start time in init and __init__ gets not
called at the time the checking starts but when the logger object is
created.
Another reason is that we dont want might create several loggers
as a default and then switch to another configured output. So we
must not print anything out at __init__ time.
newUrl(self,urlData)
Called every time an url finished checking. All data we checked is in
the UrlData object urlData.
endOfOutput(self)
Called at the end of checking to close filehandles and such.
Passing parameters to the constructor:
__init__(self, **args)
The args dictionary is filled in Config.py. There you can specify
default parameters. Adjust these parameters in the configuration
files in the appropriate logger section.
Informal text output format spec:
Output consists of a set of URL logs separated by one or more
blank lines.
A URL log consists of two or more lines. Each line consists of
keyword and data, separated by whitespace.
Unknown keywords will be ignored.
"""
def __init__ (self, **args):
super(StandardLogger, self).__init__(**args)
self.errors = 0
#self.warnings = 0
if args.has_key('fileoutput'):
self.fd = file(args['filename'], "w")
elif args.has_key('fd'):
self.fd = args['fd']
else:
self.fd = sys.stdout
def init (self):
super(StandardLogger, self).init()
if self.fd is None:
return
self.starttime = time.time()
if self.has_field('intro'):
self.fd.write("%s\n%s\n" % (linkcheck.Config.AppInfo, linkcheck.Config.Freeware))
self.fd.write(bk.i18n._("Get the newest version at %s\n") % linkcheck.Config.Url)
self.fd.write(bk.i18n._("Write comments and bugs to %s\n\n") % linkcheck.Config.Email)
self.fd.write(bk.i18n._("Start checking at %s\n") % bk.strtime.strtime(self.starttime))
self.flush()
def newUrl (self, urlData):
if self.fd is None:
return
if self.has_field('url'):
self.fd.write("\n"+self.field('url')+self.spaces('url')+
urlData.urlName)
if urlData.cached:
self.fd.write(bk.i18n._(" (cached)\n"))
else:
self.fd.write("\n")
if urlData.name and self.has_field('name'):
self.fd.write(self.field("name")+self.spaces("name")+
urlData.name+"\n")
if urlData.parentName and self.has_field('parenturl'):
self.fd.write(self.field('parenturl')+self.spaces("parenturl")+
(urlData.parentName or "")+
(bk.i18n._(", line %d")%urlData.line)+
(bk.i18n._(", col %d")%urlData.column)+"\n")
if urlData.baseRef and self.has_field('base'):
self.fd.write(self.field("base")+self.spaces("base")+
urlData.baseRef+"\n")
if urlData.url and self.has_field('realurl'):
self.fd.write(self.field("realurl")+self.spaces("realurl")+
urlData.url+"\n")
if urlData.dltime>=0 and self.has_field('dltime'):
self.fd.write(self.field("dltime")+self.spaces("dltime")+
bk.i18n._("%.3f seconds\n") % urlData.dltime)
if urlData.dlsize>=0 and self.has_field('dlsize'):
self.fd.write(self.field("dlsize")+self.spaces("dlsize")+
"%s\n"%linkcheck.StringUtil.strsize(urlData.dlsize))
if urlData.checktime and self.has_field('checktime'):
self.fd.write(self.field("checktime")+self.spaces("checktime")+
bk.i18n._("%.3f seconds\n") % urlData.checktime)
if urlData.infoString and self.has_field('info'):
self.fd.write(self.field("info")+self.spaces("info")+
linkcheck.StringUtil.indent(
linkcheck.StringUtil.blocktext(urlData.infoString, 65),
self.max_indent)+"\n")
if urlData.warningString:
#self.warnings += 1
if self.has_field('warning'):
self.fd.write(self.field("warning")+self.spaces("warning")+
linkcheck.StringUtil.indent(
linkcheck.StringUtil.blocktext(urlData.warningString, 65),
self.max_indent)+"\n")
if self.has_field('result'):
self.fd.write(self.field("result")+self.spaces("result"))
if urlData.valid:
self.fd.write(urlData.validString+"\n")
else:
self.errors += 1
self.fd.write(urlData.errorString+"\n")
self.flush()
def endOfOutput (self, linknumber=-1):
if self.fd is None:
return
if self.has_field('outro'):
self.fd.write(bk.i18n._("\nThats it. "))
#if self.warnings==1:
# self.fd.write(bk.i18n._("1 warning, "))
#else:
# self.fd.write(str(self.warnings)+bk.i18n._(" warnings, "))
if self.errors==1:
self.fd.write(bk.i18n._("1 error"))
else:
self.fd.write(str(self.errors)+bk.i18n._(" errors"))
if linknumber >= 0:
if linknumber == 1:
self.fd.write(bk.i18n._(" in 1 link"))
else:
self.fd.write(bk.i18n._(" in %d links") % linknumber)
self.fd.write(bk.i18n._(" found\n"))
self.stoptime = time.time()
duration = self.stoptime - self.starttime
self.fd.write(bk.i18n._("Stopped checking at %s (%s)\n") % \
(bk.strtime.strtime(self.stoptime),
bk.strtime.strduration(duration)))
self.flush()
self.fd = None
def flush (self):
"""ignore flush errors since we are not responsible for proper
flushing of log output streams"""
if self.fd:
try:
self.fd.flush()
except IOError:
pass

View file

@ -1,139 +0,0 @@
# -*- coding: iso-8859-1 -*-
# Copyright (C) 2000-2004 Bastian Kleineidam
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
import time
import xml.sax.saxutils
import linkcheck.logger.StandardLogger
import bk.i18n
xmlattr_entities = {
"&": "&amp;",
"<": "&lt;",
">": "&gt;",
"\"": "&quot;",
}
def xmlquote (s):
"""quote characters for XML"""
return xml.sax.saxutils.escape(s)
def xmlquoteattr (s):
"""quote XML attribute, ready for inclusion with double quotes"""
return xml.sax.saxutils.escape(s, xmlattr_entities)
def xmlunquote (s):
"""unquote characters from XML"""
return xml.sax.saxutils.unescape(s)
def xmlunquoteattr (s):
"""unquote attributes from XML"""
return xml.sax.saxutils.unescape(s, xmlattr_entities)
class XMLLogger (linkcheck.logger.StandardLogger.StandardLogger):
"""XML output mirroring the GML structure. Easy to parse with any XML
tool."""
def __init__ (self, **args):
super(XMLLogger, self).__init__(**args)
self.nodes = {}
self.nodeid = 0
def init (self):
linkcheck.logger.Logger.Logger.init(self)
if self.fd is None: return
self.starttime = time.time()
self.fd.write('<?xml version="1.0"?>\n')
if self.has_field("intro"):
self.fd.write("<!--\n")
self.fd.write(" "+bk.i18n._("created by %s at %s\n") % \
(linkcheck.Config.AppName, bk.strtime.strtime(self.starttime)))
self.fd.write(" "+bk.i18n._("Get the newest version at %s\n") % linkcheck.Config.Url)
self.fd.write(" "+bk.i18n._("Write comments and bugs to %s\n\n") % \
linkcheck.Config.Email)
self.fd.write("-->\n\n")
self.fd.write('<GraphXML>\n<graph isDirected="true">\n')
self.flush()
def newUrl (self, urlData):
"""write one node and all possible edges"""
if self.fd is None: return
node = urlData
if node.url and not self.nodes.has_key(node.url):
node.id = self.nodeid
self.nodes[node.url] = node
self.nodeid += 1
self.fd.write(' <node name="%d" ' % node.id)
self.fd.write(">\n")
if self.has_field("realurl"):
self.fd.write(" <label>%s</label>\n" %\
xmlquote(node.url))
self.fd.write(" <data>\n")
if node.dltime>=0 and self.has_field("dltime"):
self.fd.write(" <dltime>%f</dltime>\n" % node.dltime)
if node.dlsize>=0 and self.has_field("dlsize"):
self.fd.write(" <dlsize>%d</dlsize>\n" % node.dlsize)
if node.checktime and self.has_field("checktime"):
self.fd.write(" <checktime>%f</checktime>\n" \
% node.checktime)
if self.has_field("extern"):
self.fd.write(" <extern>%d</extern>\n" % \
(node.extern and 1 or 0))
self.fd.write(" </data>\n")
self.fd.write(" </node>\n")
self.writeEdges()
def writeEdges (self):
"""write all edges we can find in the graph in a brute-force
manner. Better would be a mapping of parent urls.
"""
for node in self.nodes.values():
if self.nodes.has_key(node.parentName):
self.fd.write(" <edge")
self.fd.write(' source="%d"' % \
self.nodes[node.parentName].id)
self.fd.write(' target="%d"' % node.id)
self.fd.write(">\n")
if self.has_field("url"):
self.fd.write(" <label>%s</label>\n" % \
xmlquote(node.urlName))
self.fd.write(" <data>\n")
if self.has_field("result"):
self.fd.write(" <valid>%d</valid>\n" % \
(node.valid and 1 or 0))
self.fd.write(" </data>\n")
self.fd.write(" </edge>\n")
self.flush()
def endOfOutput (self, linknumber=-1):
if self.fd is None: return
self.fd.write("</graph>\n</GraphXML>\n")
if self.has_field("outro"):
self.stoptime = time.time()
duration = self.stoptime - self.starttime
self.fd.write("<!-- ")
self.fd.write(bk.i18n._("Stopped checking at %s (%s)\n")%\
(bk.strtime.strtime(self.stoptime),
bk.strtime.strduration(duration)))
self.fd.write("-->")
self.flush()
self.fd = None

View file

@ -1,4 +0,0 @@
<a href="#myid">Bla</a>
<ul>
<li id="myid">
</ul>

View file

@ -1,8 +0,0 @@
<!-- base without href -->
<base target="_top">
<!-- meta url -->
<META HTTP-equiv="refresh" content="0; url=misc.html">
<!-- spaces between key and value -->
<a href
=
"misc.html">

View file

@ -1,3 +0,0 @@
<!-- base with href -->
<base href="base/">
<a href="test.txt">

View file

@ -1,2 +0,0 @@
<!-- codebase test -->
<applet codebase="base/" archive="test.txt">

View file

@ -1 +0,0 @@
file:///etc/group

View file

@ -1,4 +0,0 @@
@font-face {
src:url(misc.html)
}
background-image:url(news.html)

View file

@ -1,8 +0,0 @@
<a href="http.html">relative url</a>
<a href="http.html#isnix">bad anchor</a>
<a href="http.html#iswas">good anchor</a>
<a href="file:///etc/group">good file</a>
<a href="file://etc/group">bad file</a>
<a href="file:/etc/group">good file</a>
<a href="file:etc/group">bad file</a>
<a href="file:/etc/">good dir</a>

View file

@ -1 +0,0 @@
file:///etc/group

View file

@ -1,5 +0,0 @@
<!-- frame src urls -->
<frameset border="0" frameborder="0" framespacing="0">
<frame name="top" src="base1.html" frameborder="0">
<frame name="bottom" src="http.html" frameborder="0">
</frameset>

View file

@ -1,6 +0,0 @@
<a href="ftp:/ftp.debian.org/"> <!-- ftp one slash -->
<a href="ftp://ftp.debian.org/"> <!-- ftp two slashes -->
<a href="ftp://ftp.debian.org//debian/"> <!-- ftp two dir slashes -->
<a href="ftp://ftp.debian.org/debian"> <!-- missing trailing dir slash -->
<a href="ftp://ftp.debian.org////////debian/"> <!-- ftp many dir slashes -->
<a href="ftp:///ftp.debian.org/"> <!-- ftp three slashes -->

View file

@ -1,23 +0,0 @@
Just some HTTP links
<a b=c "boo" href="http://www.garantiertnixgutt.bla">bad url</a>
<a href="http://www.heise.de">ok</a>
<a href="http:/www.heise.de">one slash</a>
<a href="http:www.heise.de">no slash</a>
<a href="http://">no url</a>
<a href="http:/">no url, one slash</a>
<a href="http:">no url, no slash</a>
<a href="http://www.blubb.de/stalter&sohn">unquoted ampersand</a>
<a name="iswas">anchor for anchor.html</a>
<a href=http://slashdot.org/>unquoted</a>
<a href="http://www.heise.de/#isnix">invalid anchor</a>
<a href="HtTP://WWW.hEIsE.DE">should be cached</a>
<a href="HTTP://WWW.HEISE.DE">should be cached</a>
<!-- entities -->
<a href="http://www.heise.de/?quoted=&uuml;">html entities</a>
<a
href="&#109;a&#105;&#108;&#116;o&#58;&#112;o&#115;&#116;&#109;a&#115;&#116;&#101;&#114;@&#97;&#111;l&#46;&#100;&#101;">&#112;o&#115;&#116;&#109;a&#115;&#116;&#101;&#114;@&#97;&#111;l&#46;&#100;&#101;</a>
<!-- <a href=http://nocheckin> no check because of comment -->
<a href=illegalquote1">no beginning quote</a>
<a href="illegalquote2>no ending quote</a>
<!-- check the parser at end of file -->
<a href="g

View file

@ -1 +0,0 @@
<a href="https://sourceforge.net/">https</a>

View file

@ -1,21 +0,0 @@
<!-- extra mail checking -->
<html><head></head>
<body>
<!-- legal -->
<a href=mailto:calvin@LocalHost?subject=Hallo&to=michi>1</a>
<a href="mailto:Dude <calvin@studcs.uni-sb.de> , Killer <calvin@cs.uni-sb.de>?subject=bla">2</a>
<a href="mailto:Bastian Kleineidam <calvin@studcs.uni-sb.de>?bcc=jsmith%40wummel.company.com">3</a>
<a href="mailto:Bastian Kleineidam <calvin@studcs.uni-sb.de>">4</a>
<a href="mailto:">6</a>
<a href="mailto:o'hara@cs.uni-sb.de">5</a>
<a href="mailto:?to=calvin@studcs.uni-sb.de&subject=blubb&cc=calvin_cc@studcs.uni-sb.de&CC=calvin_CC@studcs.uni-sb.de">...</a>
<a href="mailto:news-admins@freshmeat.net?subject=Re:%20[fm%20#11093]%20(news-admins)%20Submission%20report%20-%20Pretty%20CoLoRs">...</a>
<a href="mailto:jan@jan-dittberner.de?subject=test">...</a>
<!-- illegal -->
<!-- contains non-quoted characters -->
<a href="mailto:a@d?subject=äöü">5</a>
<a href="mailto:calvin@cs.uni-sb.de?subject=Halli hallo">_</a>
<!-- ? extension forbidden in <> construct -->
<a href="mailto:Bastian Kleineidam <calvin@host1?foo=bar>">3</a>
</body>
</html>

View file

@ -1,9 +0,0 @@
<!-- meta url -->
<meta http-equiv="refresh" content="5; url=http://localhost/">
<a href="hutzli:nixgutt">bad scheme</a>
<a href="javascript:loadthis()">javascript url</a>
<!-- multiple links in one tag -->
<applet archive="misc.html" src="misc.html">
<!-- css urls -->
<img style="@font-face {src:url(misc.html)};background-image:url(news.html)"
title="CSS urls">

View file

@ -1,19 +0,0 @@
<!-- news testing -->
<a href="news:comp.os.linux.misc">
<!-- snews -->
<a href="snews:de.comp.os.unix.linux.misc">
<!-- no group -->
<a href="news:">
<!-- illegal syntax -->
<a href="news:§$%&/´`(§%">
<!-- nttp scheme with host -->
<a href="nntp://news.rz.uni-sb.de/comp.lang.python">
<!-- article span -->
<a href="nntp://news.rz.uni-sb.de/comp.lang.python/1-5">
<!-- article number -->
<a href="nntp://news.rz.uni-sb.de/EFGJG4.7A@deshaw.com">
<!-- host but no group -->
<a href="nntp://news.rz.uni-sb.de/">
<!-- article span -->
<a href="news:comp.lang.python/1-5">

View file

@ -1,5 +0,0 @@
<a href="telnet:localhost">
<a href="telnet:">
<a href="telnet://swindon.city.ac.uk">
<a href="telnet://user@swindon.city.ac.uk">
<a href="telnet://user:password@swindon.city.ac.uk">

View file

@ -1,19 +0,0 @@
test_base
url file:///home/calvin/projects/linkchecker/test/html/base1.html
valid
url file:///home/calvin/projects/linkchecker/test/html/base2.html
valid
url file:///home/calvin/projects/linkchecker/test/html/codebase.html
valid
url misc.html
valid
url misc.html
cached
valid
url test.txt
baseurl file:///home/calvin/projects/linkchecker/test/html/base/
valid
url test.txt
cached
baseurl file:///home/calvin/projects/linkchecker/test/html/base/
valid

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('base1.html', 'base2.html', 'codebase.html'):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('file.html', "file.txt", "file.asc", "file.css"):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('frames.html',):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('ftp.html',):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,14 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
htmldir = "test/html"
for filename in ('http.html',):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('https.html',):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('mail.html',):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('misc.html','anchor.html', 'norobots.html'):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('news.html',):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)

View file

@ -1,13 +0,0 @@
# -*- coding: iso-8859-1 -*-
import os, linkcheck
config = linkcheck.Config.Configuration()
config.addLogger('test', linkcheck.test_support.TestLogger)
config['recursionlevel'] = 1
config['log'] = config.newLogger('test')
config["anchors"] = True
config["verbose"] = True
config.setThreads(0)
for filename in ('telnet.html',):
url = os.path.join("test", "html", filename)
config.appendUrl(linkcheck.UrlData.GetUrlDataFrom(url, 0, config))
linkcheck.checkUrls(config)