Marius Gedminas
|
680783b1ff
|
SWF files are binary data
Should fix #372.
|
2020-04-27 11:25:37 +03:00 |
|
Chris Mayo
|
d189445a8e
|
LinkFinder does not raise StopParse
|
2020-04-18 20:30:46 +01:00 |
|
Chris Mayo
|
ee6628a831
|
Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py
Remove one subpackage and some import lines where htmlutil.linkparse is
also being used.
|
2020-04-18 20:30:45 +01:00 |
|
Chris Mayo
|
0795e3c1b4
|
Replace Parser class using BeautifulSoup.find_all()
|
2020-04-10 13:51:09 +01:00 |
|
Chris Mayo
|
02e1c389b2
|
Remove parser flush() and reset()
Remnants of the feed() interface.
|
2020-04-08 20:03:35 +01:00 |
|
Chris Mayo
|
9d8d251d06
|
Replace Parser lineno() and column() methods
Stop storing this data in Parser object state.
|
2020-04-08 20:03:35 +01:00 |
|
Chris Mayo
|
f5ae90e824
|
Parser threading lock no longer required with Beautiful Soup
|
2020-03-22 19:54:37 +00:00 |
|
Chris Mayo
|
646e138166
|
Pass encoding when unquoting
Else non-UTF-8 codes are misinterpreted:
>>> from urllib import parse
>>> parse.unquote("%FF")
'�'
>>> parse.unquote("%FF", "latin1")
'ÿ'
|
2019-10-05 19:38:57 +01:00 |
|
Chris Mayo
|
153e53ba03
|
Reuse soup object used for detecting encoding in the HTML parser
|
2019-10-05 19:38:57 +01:00 |
|
Chris Mayo
|
4f8c2954cf
|
Don't set parser.encoding
Read-only property with new Beautiful Soup parser.
|
2019-10-05 19:38:57 +01:00 |
|
Marius Gedminas
|
84dbb5d603
|
Fix TypeError: string arg required in find_links()
Fixes #317.
|
2019-10-21 17:47:46 +03:00 |
|
Chris Mayo
|
e01ea0d9f0
|
Safari bookmark parser requires bytes
|
2019-09-30 19:46:24 +01:00 |
|
Chris Mayo
|
0c90c718bf
|
Revert "Python3: fix bytes mark in parser/__init__.py"
This reverts commit aec8243348.
|
2019-09-30 19:46:24 +01:00 |
|
Petr Dlouhý
|
aec8243348
|
Python3: fix bytes mark in parser/__init__.py
|
2019-04-09 20:09:35 +01:00 |
|
Yaroslav Halchenko
|
7ed7919692
|
RF: place parser.flush() under mutex as well
Just a safety measure, not yet proven to be required but overall
makes sense
|
2018-11-06 10:58:10 -05:00 |
|
Yaroslav Halchenko
|
ee27e178ec
|
BF: place a mutex around apparently thread-unsafe parser.feed invocation
That leads to fix up of anchors analysis and probably other issues
such as floating number of found urls etc
|
2018-11-01 11:10:01 -04:00 |
|
Bastian Kleineidam
|
ee4545399d
|
Support itms-services: URLs. #532
|
2014-09-05 21:06:10 +02:00 |
|
Bastian Kleineidam
|
82dd76b0d7
|
Add PDF link parsing.
|
2014-04-28 18:13:45 +02:00 |
|
Bastian Kleineidam
|
b6b5c7a12e
|
Simpler link parsing routine.
|
2014-03-27 19:49:17 +01:00 |
|
Bastian Kleineidam
|
fab2c2da98
|
Improve content type setting.
|
2014-03-05 20:12:19 +01:00 |
|
Bastian Kleineidam
|
ef13a3fce1
|
Implement sitemap and sitemap index parsing.
|
2014-03-05 19:26:37 +01:00 |
|
Bastian Kleineidam
|
00bd549c0c
|
Remove duplicate content type map.
|
2014-03-05 19:24:58 +01:00 |
|
Bastian Kleineidam
|
f9bf831804
|
Remove some empty lines
|
2014-03-01 12:02:00 +01:00 |
|
Bastian Kleineidam
|
9d0255e156
|
Fix bookmark imports
|
2014-03-01 10:16:29 +01:00 |
|
Bastian Kleineidam
|
7b34be590b
|
Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.
|
2014-03-01 00:12:34 +01:00 |
|
calvin
|
3bbfac47c7
|
removed
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1353 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2004-07-06 20:34:00 +00:00 |
|
calvin
|
bde88f9715
|
added string utils to parser, and sync with webcleaner
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1350 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2004-07-02 18:25:00 +00:00 |
|
calvin
|
1b148b0b4e
|
sorted dict
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1293 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2004-04-04 08:30:01 +00:00 |
|
calvin
|
66ecc466b7
|
resolve entities
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1202 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2004-01-28 22:48:50 +00:00 |
|
calvin
|
fef96392d6
|
updated copyright
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1150 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2004-01-03 14:59:33 +00:00 |
|
calvin
|
308ceb45c5
|
add coding line
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@933 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2003-07-04 14:24:44 +00:00 |
|
calvin
|
bd1e7c158a
|
parser added
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@615 e7d03fd6-7b0d-0410-9947-9c21f3af8025
|
2002-11-23 23:09:11 +00:00 |
|