mirror of https://github.com/Hopiu/lychee.git synced 2026-03-16 20:50:25 +00:00

Use tokenizer for extraction; add benchmark (#424 )

This avoids creating a DOM tree for link extraction and instead uses a `TokenSink` for on-the-fly extraction. In hyperfine benchmarks it was about 10-25% faster than the master.

Old: 4.557 s ± 0.404 s
New: 3.832 s ± 0.131 s

The performance fluctuates a little less as well.

Some missing element/attribute pairs were also added, which contain links according to the HTML spec. These occur very rarely, but it's good to parse them for completeness' sake.

Furthermore tried to clean up a lot of papercuts around our types. We now differentiate between a `RawUri` (stringy-types) and a Uri, which is a properly parsed `URI` type.
The extractor now only deals with extracting `RawUri`s while the collector creates the request objects.

2021-12-16 18:45:52 +01:00

110 B

Raw Permalink Blame History

Benchmarks

Testing critical sections of lychee for performance.
Run with

cargo bench -p benches

110 B Raw Permalink Blame History

Benchmarks

110 B

Raw Permalink Blame History