Commit graph

5 commits

Author SHA1 Message Date
Matthias
36450621fa
Update dependencies (#454) 2022-01-10 22:35:37 +01:00
Matthias
0645177b84
Bump version (#450) 2022-01-10 01:38:46 +01:00
Matthias
01393b34a2
Upgrade to Rust 2021 (#427) 2021-12-17 01:32:13 +01:00
dependabot[bot]
58785311f0
Bump tokio from 1.14.0 to 1.15.0 (#425)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.14.0 to 1.15.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.14.0...tokio-1.15.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-12-16 19:32:51 +01:00
Matthias
166c86c30e
Use tokenizer for extraction; add benchmark (#424)
This avoids creating a DOM tree for link extraction and instead uses a `TokenSink` for on-the-fly extraction. In hyperfine benchmarks it was about 10-25% faster than the master.

Old: 4.557 s ± 0.404 s
New: 3.832 s ± 0.131 s

The performance fluctuates a little less as well.

Some missing element/attribute pairs were also added, which contain links according to the HTML spec. These occur very rarely, but it's good to parse them for completeness' sake.

Furthermore tried to clean up a lot of papercuts around our types. We now differentiate between a `RawUri` (stringy-types) and a Uri, which is a properly parsed `URI` type.
The extractor now only deals with extracting `RawUri`s while the collector creates the request objects.
2021-12-16 18:45:52 +01:00