Chris Mayo
3771dd9136
Use parser.feed_soup() instead of parser.feed()
...
Markup is not being passed in pieces to the parser, so simplify the
interface and reduce the state further.
2020-04-08 20:03:35 +01:00
Chris Mayo
40f43ae41c
Create one function to make soup objects
2020-04-08 20:03:35 +01:00
Chris Mayo
9d8d251d06
Replace Parser lineno() and column() methods
...
Stop storing this data in Parser object state.
2020-04-08 20:03:35 +01:00
Chris Mayo
fe024fb0c8
Remove unused Parser.debug() method
2020-04-03 19:24:08 +01:00
Chris Mayo
0c5e3bb403
Remove old HtmlParser .gitignore
...
htmlparse.output was a product of the built-in parser.
2020-04-03 19:24:08 +01:00
Chris Mayo
3ff3d72492
Use BeautifulSoup element attrs directly
2020-04-03 19:24:08 +01:00
Chris Mayo
a7e1e20172
Remove last line and column from Parser
...
Only used for debug log message and not very useful.
2020-04-03 19:24:08 +01:00
Chris Mayo
ffa6ac457f
Remove support for non-Tag elements from Parser
...
This change is made because the linkchecker handlers only process
Tags.
The test HtmlPrettyPrinter handler is updated to output element text
because its support for non-Tag elements has been removed. This results
in a number of the existing tests still passing.
2020-03-31 20:10:35 +01:00
Chris Mayo
1255119ca8
Move HtmlPrinter and HtmlPrettyPrinter into tests
2020-03-30 19:32:30 +01:00
Chris Mayo
5b66964afa
Remove unused .charset from checker classes
...
Unused since:
4f8c2954 ("Don't set parser.encoding", 2019-10-05)
2020-03-30 19:32:30 +01:00
Chris Mayo
f743be57e8
Remove unused functions from linkcheck.HtmlParser
...
resolve_entities() unused since:
2c000683 ("Remove unused linkcheck.htmlutil.linkname module",
2020-03-30)
set_doctype(), set_encoding() unused since:
51a06d8a ("Remove home-cooked htmlparser and use BeautifulSoup",
2019-07-22)
2020-03-30 19:32:18 +01:00
Marius Gedminas
af0f50efa8
Restore support for older BeautifulSoup4 versions
2020-03-30 14:49:56 +03:00
Marius Gedminas
a311ebb97e
Fix doctype tests
...
I don't think linkchecker actually cares about the document type, so I'm
not sure why we're even testing this...
2020-03-23 10:56:57 +02:00
Chris Mayo
153e53ba03
Reuse soup object used for detecting encoding in the HTML parser
2019-10-05 19:38:57 +01:00
Chris Mayo
978042a54e
Hide Beautiful Soup soupsieve warning
...
Shown every time linkchecker is run:
/usr/lib/python3.7/site-packages/bs4/element.py:16: UserWarning: The
soupsieve package is not installed. CSS selectors cannot be used.
'The soupsieve package is not installed. CSS selectors cannot be used.'
2019-10-05 19:38:57 +01:00
Chris Mayo
30df69c158
Improve pretty printed comments
2019-10-05 19:38:57 +01:00
Chris Mayo
607328d5c5
Support Beautiful Soup line numbers
2019-10-05 19:38:57 +01:00
Petr Dlouhý
b5111453d8
change test_parse encoding to UTF-8
2019-07-22 19:59:37 +01:00
Petr Dlouhý
d6d48b4814
html parser: use name instead of peeking
2019-07-22 19:59:37 +01:00
Petr Dlouhý
51a06d8a1e
Remove home-cooked htmlparser and use BeautifulSoup
2019-07-22 19:59:37 +01:00
Petr Dlouhý
2daf685633
Python3: fix few htmllib problems
2018-01-05 22:48:46 +01:00
Petr Dlouhý
8b9f29ae52
Python3: fix unichr() in htmlparser
2019-09-09 19:51:30 +01:00
Petr Dlouhý
bc99dc51de
Python3: fix HtmlParser
2019-04-18 19:35:16 +01:00
Marius Gedminas
fb1debaa68
Fix incompatible pointer type warnings
...
The warnings looked like this:
htmlparse.c: In function ‘yyparse’:
htmlparse.c:1810:18: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types]
htmlparse.y:40:13: note: expected ‘PyObject ** {aka struct _object **}’ but argument is of type ‘PyObject * {aka struct _object *}’
htmlparse.c:1927:12: warning: passing argument 1 of ‘yyerror’ from incompatible pointer type [-Wincompatible-pointer-types]
htmlparse.y:40:13: note: expected ‘PyObject ** {aka struct _object **}’ but argument is of type ‘PyObject * {aka struct _object *}’
The argument is not used, so it doesn't really matter what pointer type
it is.
2017-02-24 15:04:09 +02:00
Marius Gedminas
03dfe3d3a1
Fix "operation on ... may be undefined" [-Wsequence-point] warnings
...
Fixes a bunch of warnings like
htmlparse.y:509:25: warning: operation on ‘self->userData->buf’ may be undefined [-Wsequence-point]
htmlparse.y:518:29: warning: operation on ‘self->userData->tmp_buf’ may be undefined [-Wsequence-point]
which were a result of (macro-expanded) code like this (simplified):
if ((tmp = (tmp = PyMem_Realloc(...))) == NULL) return NULL;
The PyMem_Resize(p, ...) macro assigns the new value to p before
returning it, so there's no need to assign it again.
See http://bugs.python.org/issue1668036 for evidence (from 2007) that
this is indeed a documented side-effect of the macro API.
2017-02-13 15:20:33 +02:00
Bastian Kleineidam
3d711666e1
Fix parser for changes in bison 3.0.x
2015-11-26 12:33:44 +01:00
Bastian Kleineidam
029c20ed98
More python3 fixes
2014-09-12 21:59:07 +02:00
Bastian Kleineidam
35eb30432e
Added some Python3 fixes.
2014-09-12 19:36:30 +02:00
Bastian Kleineidam
7b34be590b
Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.
2014-03-01 00:12:34 +01:00
Bastian Kleineidam
6d5e5f9efb
Updated copyright.
2012-03-30 22:24:10 +02:00
Bastian Kleineidam
9ee9abcf0f
Parse invalid comments <! bla >
2012-03-23 07:41:03 +01:00
Bastian Kleineidam
b9b8e3f5b2
Honor the charset encoding of the Content-Type HTTP
...
header when parsing HTML.
2012-03-22 22:45:11 +01:00
Bastian Kleineidam
71f5ee42c8
Updated copyright.
2012-01-29 17:18:28 +01:00
Bastian Kleineidam
dff425710d
More Freshmeat/Freecode replacements.
2011-12-25 09:06:18 +01:00
Bastian Kleineidam
5c496df9ed
Regenerate HTML parser with new Bison 2.5 version.
2011-10-31 06:41:45 +01:00
Bastian Kleineidam
fb237041d1
Updated copyright
2011-10-20 08:14:16 +02:00
Bastian Kleineidam
d2ae6bf71c
Properly detect HTML character encoding.
2011-08-14 12:49:31 +02:00
Bastian Kleineidam
689ab9f073
Add debugging for charset encoding parameter setting.
2011-08-14 12:45:08 +02:00
Bastian Kleineidam
c9707ee735
Handle stray < before end tags.
2011-05-28 13:39:04 +02:00
Bastian Kleineidam
7d04c3ee81
Handle stray < characters in HTML.
2011-05-20 06:50:08 +02:00
Bastian Kleineidam
74c132c90b
Updated copyright.
2011-04-26 14:57:57 +02:00
Bastian Kleineidam
54a14d0f91
Use Python 2.7 for local build.
2011-04-22 08:39:45 +02:00
Bastian Kleineidam
c0957a20df
Make strlen variables type size_t.
2011-04-19 16:07:10 +02:00
Bastian Kleineidam
4c98c463dc
Correctly declare all variables at beginning of block.
2011-04-16 15:25:51 +02:00
Bastian Kleineidam
f4f921384e
Updated copyright
2011-03-13 07:52:18 +01:00
Bastian Kleineidam
427b878834
Updated translation and copyright
2010-12-18 21:00:29 +01:00
Bastian Kleineidam
e48acc08af
Remove old comments and set line and column number on flush.
2010-12-11 07:57:50 +01:00
Bastian Kleineidam
03034ddc1c
Updated copyright
2010-11-21 11:25:07 +01:00
Bastian Kleineidam
6dcb0e10de
Require Python 2.6
2010-11-21 10:42:44 +01:00
Bastian Kleineidam
5b5a62f6d5
Updated copyright
2010-03-10 00:05:05 +01:00