Commit graph

126 commits

Author SHA1 Message Date
Chris Mayo
ffa6ac457f Remove support for non-Tag elements from Parser
This change is made because the linkchecker handlers only process
Tags.

The test HtmlPrettyPrinter handler is updated to output element text
because its support for non-Tag elements has been removed. This results
in a number of the existing tests still passing.
2020-03-31 20:10:35 +01:00
Chris Mayo
1255119ca8 Move HtmlPrinter and HtmlPrettyPrinter into tests 2020-03-30 19:32:30 +01:00
Chris Mayo
5b66964afa Remove unused .charset from checker classes
Unused since:
4f8c2954 ("Don't set parser.encoding", 2019-10-05)
2020-03-30 19:32:30 +01:00
Chris Mayo
f743be57e8 Remove unused functions from linkcheck.HtmlParser
resolve_entities() unused since:
2c000683 ("Remove unused linkcheck.htmlutil.linkname module",
2020-03-30)

set_doctype(), set_encoding() unused since:
51a06d8a ("Remove home-cooked htmlparser and use BeautifulSoup",
2019-07-22)
2020-03-30 19:32:18 +01:00
Marius Gedminas
af0f50efa8 Restore support for older BeautifulSoup4 versions 2020-03-30 14:49:56 +03:00
Marius Gedminas
a311ebb97e Fix doctype tests
I don't think linkchecker actually cares about the document type, so I'm
not sure why we're even testing this...
2020-03-23 10:56:57 +02:00
Chris Mayo
153e53ba03 Reuse soup object used for detecting encoding in the HTML parser 2019-10-05 19:38:57 +01:00
Chris Mayo
978042a54e Hide Beautiful Soup soupsieve warning
Shown every time linkchecker is run:

/usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The
soupsieve package is not installed. CSS selectors cannot be used.
  'The soupsieve package is not installed. CSS selectors cannot be used.'
2019-10-05 19:38:57 +01:00
Chris Mayo
30df69c158 Improve pretty printed comments 2019-10-05 19:38:57 +01:00
Chris Mayo
607328d5c5 Support Beautiful Soup line numbers 2019-10-05 19:38:57 +01:00
Petr Dlouhý
b5111453d8 change test_parse encoding to UTF-8 2019-07-22 19:59:37 +01:00
Petr Dlouhý
d6d48b4814 html parser: use name instead of peeking 2019-07-22 19:59:37 +01:00
Petr Dlouhý
51a06d8a1e Remove home-cooked htmlparser and use BeautifulSoup 2019-07-22 19:59:37 +01:00
Petr Dlouhý
2daf685633 Python3: fix few htmllib problems 2018-01-05 22:48:46 +01:00
Petr Dlouhý
8b9f29ae52 Python3: fix unichr() in htmlparser 2019-09-09 19:51:30 +01:00
Petr Dlouhý
bc99dc51de Python3: fix HtmlParser 2019-04-18 19:35:16 +01:00
Marius Gedminas
fb1debaa68 Fix incompatible pointer type warnings
The warnings looked like this:

    htmlparse.c: In function ‘yyparse’:
    htmlparse.c:1810:18: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types]
    htmlparse.y:40:13: note: expected ‘PyObject ** {aka struct _object **}’ but argument is of type ‘PyObject * {aka struct _object *}’
    htmlparse.c:1927:12: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types]
    htmlparse.y:40:13: note: expected ‘PyObject ** {aka struct _object **}’ but argument is of type ‘PyObject * {aka struct _object *}’

The argument is not used, so it doesn't really matter what pointer type
it is.
2017-02-24 15:04:09 +02:00
Marius Gedminas
03dfe3d3a1 Fix "operation on ... may be undefined" [-Wsequence-point] warnings
Fixes a bunch of warnings like

  htmlparse.y:509:25: warning: operation on ‘self->userData->buf’ may be undefined [-Wsequence-point]
  htmlparse.y:518:29: warning: operation on ‘self->userData->tmp_buf’ may be undefined [-Wsequence-point]

which were a result of (macro-expanded) code like this (simplified):

  if ((tmp = (tmp = PyMem_Realloc(...))) == NULL) return NULL;

The PyMem_Resize(p, ...) macro assigns the new value to p before
returning it, so there's no need to assign it again.

See http://bugs.python.org/issue1668036 for evidence (from 2007) that
this is indeed a documented side-effect of the macro API.
2017-02-13 15:20:33 +02:00
Bastian Kleineidam
3d711666e1 Fix parser for changes in bison 3.0.x 2015-11-26 12:33:44 +01:00
Bastian Kleineidam
029c20ed98 More python3 fixes 2014-09-12 21:59:07 +02:00
Bastian Kleineidam
35eb30432e Added some Python3 fixes. 2014-09-12 19:36:30 +02:00
Bastian Kleineidam
7b34be590b Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements. 2014-03-01 00:12:34 +01:00
Bastian Kleineidam
6d5e5f9efb Updated copyright. 2012-03-30 22:24:10 +02:00
Bastian Kleineidam
9ee9abcf0f Parse invalid comments <! bla > 2012-03-23 07:41:03 +01:00
Bastian Kleineidam
b9b8e3f5b2 Honor the charset encoding of the Content-Type HTTP
header when parsing HTML.
2012-03-22 22:45:11 +01:00
Bastian Kleineidam
71f5ee42c8 Updated copyright. 2012-01-29 17:18:28 +01:00
Bastian Kleineidam
dff425710d More Freshmeat/Freecode replacements. 2011-12-25 09:06:18 +01:00
Bastian Kleineidam
5c496df9ed Regenerate HTML parser with new Bison 2.5 version. 2011-10-31 06:41:45 +01:00
Bastian Kleineidam
fb237041d1 Updated copyright 2011-10-20 08:14:16 +02:00
Bastian Kleineidam
d2ae6bf71c Properly detect HTML character encoding. 2011-08-14 12:49:31 +02:00
Bastian Kleineidam
689ab9f073 Add debugging for charset encoding parameter setting. 2011-08-14 12:45:08 +02:00
Bastian Kleineidam
c9707ee735 Handle stray < before end tags. 2011-05-28 13:39:04 +02:00
Bastian Kleineidam
7d04c3ee81 Handle stray < characters in HTML. 2011-05-20 06:50:08 +02:00
Bastian Kleineidam
74c132c90b Updated copyright. 2011-04-26 14:57:57 +02:00
Bastian Kleineidam
54a14d0f91 Use Python 2.7 for local build. 2011-04-22 08:39:45 +02:00
Bastian Kleineidam
c0957a20df Make strlen variables type size_t. 2011-04-19 16:07:10 +02:00
Bastian Kleineidam
4c98c463dc Correctly declare all variables at beginning of block. 2011-04-16 15:25:51 +02:00
Bastian Kleineidam
f4f921384e Updated copyright 2011-03-13 07:52:18 +01:00
Bastian Kleineidam
427b878834 Updated translation and copyright 2010-12-18 21:00:29 +01:00
Bastian Kleineidam
e48acc08af Remove old comments and set line and column number on flush. 2010-12-11 07:57:50 +01:00
Bastian Kleineidam
03034ddc1c Updated copyright 2010-11-21 11:25:07 +01:00
Bastian Kleineidam
6dcb0e10de Require Python 2.6 2010-11-21 10:42:44 +01:00
Bastian Kleineidam
5b5a62f6d5 Updated copyright 2010-03-10 00:05:05 +01:00
Bastian Kleineidam
57397e938b Improved linkname parsing by adding a new peek() HTML parser function. 2010-03-09 11:31:12 +01:00
Bastian Kleineidam
5e06b6b8d4 Updated FSF address in GPL blurb 2009-07-24 23:58:20 +02:00
Bastian Kleineidam
a0ba9a7446 Improved Python 2.6 compatibility in HTML parser 2009-02-28 13:47:25 +01:00
calvin
527b617f88 Regenerate with newer flex and bison versions.
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3949 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-02-01 11:21:13 +00:00
calvin
e9805dbd8a Updated copyright year to 2009
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3887 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2009-01-08 14:18:03 +00:00
calvin
2f25962789 Match newlines in catch-all rules
Avoid printing spurious newlines when HTML parsing. The "." does
not match newlines, correct that in the catch-all lexer rules.


git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3760 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-05-20 16:56:58 +00:00
calvin
3eac1be9ab Require and use Python 2.5
Use Python 2.5 features and get rid of old compat code. Also some
code cleanups have been made.


git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@3737 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2008-04-27 11:39:21 +00:00