Chris Mayo
a92a684ac4
Run black on linkcheck/
2020-05-30 17:01:36 +01:00
Chris Mayo
488e72c81f
Ignore imports providing aliases in subpackages
2020-05-26 19:49:59 +01:00
Chris Mayo
7257e5e1a0
Remove unused imports in parser/__init__.py
2020-05-25 19:50:57 +01:00
Chris Mayo
1663e10fe7
Remove spaces after names in function definitions
...
This is a PEP 8 convention, E211.
2020-05-16 20:19:42 +01:00
Chris Mayo
ed13a926d3
Remove setting Python 2 xmlparser.returns_unicode
2020-05-16 17:02:00 +01:00
Chris Mayo
736c893707
Merge pull request #377 from cjmayo/tidyten3
...
Remove u string prefixes
2020-05-13 19:36:54 +01:00
Chris Mayo
44e81d27dd
Remove inheriting object
...
All Python 3 classes are new-style.
2020-05-08 10:45:31 +01:00
Chris Mayo
b0ea72e8c1
Remove # -*- coding: lines
...
Except for tests that include non-unicode characters:
tests/test_po.py
tests/test_strformat.py
tests/test_url.py
tests/checker/test_error.py
tests/checker/test_news.py
2020-05-08 10:45:31 +01:00
Chris Mayo
4d3e5abcfa
Remove u string prefixes
2020-04-30 20:11:59 +01:00
Chris Mayo
9eed070a73
Stop using HTML handlers
...
LinkFinder is the only remaining HTML handler therefore no need for
htmlsoup.process_soup() as an independent function or TagFinder as a
base class.
2020-04-29 20:07:00 +01:00
Marius Gedminas
680783b1ff
SWF files are binary data
...
Should fix #372 .
2020-04-27 11:25:37 +03:00
Chris Mayo
d189445a8e
LinkFinder does not raise StopParse
2020-04-18 20:30:46 +01:00
Chris Mayo
ee6628a831
Move HtmlParser/htmlsax.py to htmlutil/htmlsoup.py
...
Remove one subpackage and some import lines where htmlutil.linkparse is
also being used.
2020-04-18 20:30:45 +01:00
Chris Mayo
0795e3c1b4
Replace Parser class using BeautifulSoup.find_all()
2020-04-10 13:51:09 +01:00
Chris Mayo
02e1c389b2
Remove parser flush() and reset()
...
Remnants of the feed() interface.
2020-04-08 20:03:35 +01:00
Chris Mayo
9d8d251d06
Replace Parser lineno() and column() methods
...
Stop storing this data in Parser object state.
2020-04-08 20:03:35 +01:00
Chris Mayo
f5ae90e824
Parser threading lock no longer required with Beautiful Soup
2020-03-22 19:54:37 +00:00
Chris Mayo
646e138166
Pass encoding when unquoting
...
Else non-UTF-8 codes are misinterpreted:
>>> from urllib import parse
>>> parse.unquote("%FF")
'�'
>>> parse.unquote("%FF", "latin1")
'ÿ'
2019-10-05 19:38:57 +01:00
Chris Mayo
153e53ba03
Reuse soup object used for detecting encoding in the HTML parser
2019-10-05 19:38:57 +01:00
Chris Mayo
4f8c2954cf
Don't set parser.encoding
...
Read-only property with new Beautiful Soup parser.
2019-10-05 19:38:57 +01:00
Chris Mayo
ec8b6e09f0
Fix XmlTagUrlParser and make Python 3 compatible
...
URLs within a sitemap file were not being captured.
2019-10-28 19:20:05 +00:00
Marius Gedminas
a1af1e9717
Fix sitemap parser
...
PyExpat wants bytes on Python 2. See #323 .
2019-10-23 17:23:23 +03:00
Marius Gedminas
84dbb5d603
Fix TypeError: string arg required in find_links()
...
Fixes #317 .
2019-10-21 17:47:46 +03:00
Chris Mayo
e01ea0d9f0
Safari bookmark parser requires bytes
2019-09-30 19:46:24 +01:00
Chris Mayo
0c90c718bf
Revert "Python3: fix bytes mark in parser/__init__.py"
...
This reverts commit aec8243348 .
2019-09-30 19:46:24 +01:00
Petr Dlouhý
aec8243348
Python3: fix bytes mark in parser/__init__.py
2019-04-09 20:09:35 +01:00
Yaroslav Halchenko
7ed7919692
RF: place parser.flush() under mutex as well
...
Just a safety measure, not yet proven to be required but overall
makes sense
2018-11-06 10:58:10 -05:00
Yaroslav Halchenko
ee27e178ec
BF: place a mutex around apparently thread-unsafe parser.feed invocation
...
That leads to fix up of anchors analysis and probably other issues
such as floating number of found urls etc
2018-11-01 11:10:01 -04:00
Bastian Kleineidam
ee4545399d
Support itms-services: URLs. #532
2014-09-05 21:06:10 +02:00
Bastian Kleineidam
ad8eb424f3
Merge Mark-Hetherington-xml-parse-warn with slight modifications.
2014-06-13 20:50:37 +02:00
Bastian Kleineidam
82dd76b0d7
Add PDF link parsing.
2014-04-28 18:13:45 +02:00
Bastian Kleineidam
b6b5c7a12e
Simpler link parsing routine.
2014-03-27 19:49:17 +01:00
Bastian Kleineidam
fab2c2da98
Improve content type setting.
2014-03-05 20:12:19 +01:00
Bastian Kleineidam
ef13a3fce1
Implement sitemap and sitemap index parsing.
2014-03-05 19:26:37 +01:00
Bastian Kleineidam
00bd549c0c
Remove duplicate content type map.
2014-03-05 19:24:58 +01:00
Bastian Kleineidam
f9bf831804
Remove some empty lines
2014-03-01 12:02:00 +01:00
Bastian Kleineidam
9d0255e156
Fix bookmark imports
2014-03-01 10:16:29 +01:00
Bastian Kleineidam
7b34be590b
Introduce check plugins, use Python requests for http/s connections, and some code cleanups and improvements.
2014-03-01 00:12:34 +01:00
calvin
5a644b35b3
removed
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1354 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-07-06 22:08:05 +00:00
calvin
3bbfac47c7
removed
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1353 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-07-06 20:34:00 +00:00
calvin
bde88f9715
added string utils to parser, and sync with webcleaner
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1350 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-07-02 18:25:00 +00:00
calvin
04e0a9448d
updated
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1325 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-05-27 00:30:37 +00:00
calvin
fa46757bd7
fix import
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1298 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-04-04 08:34:21 +00:00
calvin
68451e65dd
O3
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1297 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-04-04 08:31:57 +00:00
calvin
93253954a8
updated
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1296 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-04-04 08:30:48 +00:00
calvin
672e118d9b
use sorted dict
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1295 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-04-04 08:30:38 +00:00
calvin
8e4e92dddd
minor improvements
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1294 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-04-04 08:30:21 +00:00
calvin
1b148b0b4e
sorted dict
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1293 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-04-04 08:30:01 +00:00
calvin
e183ac84dc
handle missing startquotes
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1292 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-04-04 08:29:31 +00:00
calvin
4df200a2d2
merged from webcleaner
...
git-svn-id: https://linkchecker.svn.sourceforge.net/svnroot/linkchecker/trunk/linkchecker@1205 e7d03fd6-7b0d-0410-9947-9c21f3af8025
2004-01-28 23:38:00 +00:00