lychee/fixtures
Matthias 166c86c30e
Use tokenizer for extraction; add benchmark (#424)
This avoids creating a DOM tree for link extraction and instead uses a `TokenSink` for on-the-fly extraction. In hyperfine benchmarks it was about 10-25% faster than the master.

Old: 4.557 s ± 0.404 s
New: 3.832 s ± 0.131 s

The performance fluctuates a little less as well.

Some missing element/attribute pairs were also added, which contain links according to the HTML spec. These occur very rarely, but it's good to parse them for completeness' sake.

Furthermore tried to clean up a lot of papercuts around our types. We now differentiate between a `RawUri` (stringy-types) and a Uri, which is a properly parsed `URI` type.
The extractor now only deals with extracting `RawUri`s while the collector creates the request objects.
2021-12-16 18:45:52 +01:00
..
ignore Add support for .lycheeignore file #308 (#402) 2021-11-23 01:39:53 +01:00
offline Add failing test 2021-09-09 01:17:56 +02:00
elvis.html Use tokenizer for extraction; add benchmark (#424) 2021-12-16 18:45:52 +01:00
TEST.html feat: Support relative URLs (#15) 2020-10-21 01:31:06 +02:00
TEST.md Check real link to file 2021-09-06 15:19:09 +02:00
TEST_ALL_PRIVATE.md feat: Support relative URLs (#15) 2020-10-21 01:31:06 +02:00
TEST_EMAIL.md Fix exclude mail, add tests 2021-03-29 23:28:17 +02:00
TEST_EXCLUDE_1.txt Added support for --exclude-file. 2021-09-03 16:29:57 +02:00
TEST_EXCLUDE_2.txt Added support for --exclude-file. 2021-09-03 16:29:57 +02:00
TEST_GITHUB.md Make GITHUB_TOKEN optional (#22) 2020-10-26 23:31:31 +01:00
TEST_GITHUB_404.md Make GITHUB_TOKEN optional (#22) 2020-10-26 23:31:31 +01:00
TEST_HTML5.html example.com -> example.org 2021-02-21 16:33:33 +01:00
TEST_HTML5_CUSTOM_ELEMENTS.html example.com -> example.org 2021-02-21 16:33:33 +01:00
TEST_HTML5_LOWERCASE_DOCTYPE.html example.com -> example.org 2021-02-21 16:33:33 +01:00
TEST_HTML5_MALFORMED_LINKS.html example.com -> example.org 2021-02-21 16:33:33 +01:00
TEST_HTML5_MINIFIED.html example.com -> example.org 2021-02-21 16:33:33 +01:00
TEST_HTTP.html Add new flag --require-https (#195) 2021-09-04 03:21:54 +02:00
TEST_QUIRKS.txt Skip unsupported schemes (#236) 2021-04-26 17:16:58 +02:00
TEST_REPETITION_1.txt Remove cache from collector and remove custom reqwest client pool 2021-10-07 18:07:18 +02:00
TEST_REPETITION_2.txt Remove cache from collector and remove custom reqwest client pool 2021-10-07 18:07:18 +02:00
TEST_SCHEMES.md Add support for multiple schemes (#237) 2021-04-26 18:24:54 +02:00
TEST_SCHEMES.txt explicit naming 2021-09-06 15:19:09 +02:00