Commit graph

144 commits

Author SHA1 Message Date
Matthias
ba276cd51b
Error cleanup (#510)
* Add more fine-grained error types; remove generic IO error
* Update error message for missing file
* Remove missing `Error` suffix
* Rename ErrorKind::Github to ErrorKind::GithubRequest for consistency with NetworkRequest
2022-02-19 01:44:00 +01:00
dependabot[bot]
e2d303b493
Bump check-if-email-exists from 0.8.26 to 0.8.28 (#516)
Bumps [check-if-email-exists](https://github.com/reacherhq/check-if-email-exists) from 0.8.26 to 0.8.28.
- [Release notes](https://github.com/reacherhq/check-if-email-exists/releases)
- [Changelog](https://github.com/reacherhq/check-if-email-exists/blob/master/CHANGELOG.md)
- [Commits](https://github.com/reacherhq/check-if-email-exists/compare/v0.8.26...v0.8.28)

---
updated-dependencies:
- dependency-name: check-if-email-exists
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-18 17:09:52 +01:00
dependabot[bot]
6836deac79
Bump par-stream from 0.10.0 to 0.10.2 (#518)
Bumps [par-stream](https://github.com/jerry73204/par-stream) from 0.10.0 to 0.10.2.
- [Release notes](https://github.com/jerry73204/par-stream/releases)
- [Commits](https://github.com/jerry73204/par-stream/commits)

---
updated-dependencies:
- dependency-name: par-stream
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-18 17:09:30 +01:00
dependabot[bot]
98535f7bd3
Bump typed-builder from 0.9.1 to 0.10.0 (#512)
Bumps [typed-builder](https://github.com/idanarye/rust-typed-builder) from 0.9.1 to 0.10.0.
- [Release notes](https://github.com/idanarye/rust-typed-builder/releases)
- [Changelog](https://github.com/idanarye/rust-typed-builder/blob/master/CHANGELOG.md)
- [Commits](https://github.com/idanarye/rust-typed-builder/commits)

---
updated-dependencies:
- dependency-name: typed-builder
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-18 11:00:30 +01:00
dependabot[bot]
2114406235
Bump futures from 0.3.19 to 0.3.21 (#493)
Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.19 to 0.3.21.
- [Release notes](https://github.com/rust-lang/futures-rs/releases)
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.19...0.3.21)

---
updated-dependencies:
- dependency-name: futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-18 10:58:01 +01:00
Matthias
812663d832
Prevent flaky tests (#514)
Move from example.org to example.com, which seems to be more permissive for testing
2022-02-18 10:29:49 +01:00
Lucius Hu
6d56c6b55c
Replace plain String with SecretString for GitHub token (#509)
This commit changed the type of `lychee-lib::ClientBuilder::github_token` from
`String` to `secrecy::SecretString` to fortify the secret management within our
program.

Note that this won't affect TOML configuration of `lychee-bin` because
`serde::Deserialize` is still implemented for `SecretString`.
2022-02-13 13:53:46 +01:00
Matthias
47df7780fe
Use captured identifiers in format strings (#507)
Makes for arguably cleaner-looking code.
The downside is that the MSRV is 1.58
https://blog.rust-lang.org/2022/01/13/Rust-1.58.0.html

Given that nobody uses lychee as a library yet
and we have precompiled binaries, it's an acceptable
tradeoff.
My little research revealed that this is a much-liked
feature: https://twitter.com/matthiasendler/status/1483895557621960715
2022-02-12 10:51:52 +01:00
Lucius Hu
53c41b03d8
replace hubcaps by octocrab (#502)
This commit replaced `hubcaps` by `octocrab`, which has more downloads per month
and receives more frequent release updates.

The caveats are:

1. When instantiating the API client, `octocrab` doesn't offer you a way to
specify custom user-agent. But I would argue that, at least presently, this
doesn't seem to cause issues.
2. `octocrab` doesn't export as much details of its error types as `hubcaps`
does. So we will have fewer control on the display of the error message. But I
would also argue that this is not really important. Though we should do more
tests to make sure the error looks good enough.

* hide implementation details in error message

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2022-02-11 23:43:47 +01:00
Lucius Hu
476a048350
lychee-lib::client reworked (#500)
This commit mainly added or improved documentation for `lychee-lib::client`
module.

But it also contains a few API changes:

- `ClientBuilder::client()` now consumes itself instead of taking a reference.
  This helps to avoid a few unnecessary clones.
- `ClientBuilder::build_filter()` was a private function and is inlined to avoid
  unnecessary clones.
- Added a new crate-scoped function `Uri::set_scheme()`.

* added notes on deprecated site-local network

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2022-02-10 00:04:48 +01:00
Lucius Hu
5921fd248a
Update license files (#497)
- The date in MIT license files have been updated to 2022
- Each of the benchmark and example crates are theoretically
  a separate package in Cargo's sense. So license files are
  added for them as well.

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2022-02-08 10:59:54 +01:00
Markus Unterwaditzer
68d09f7e5b
Add html5gum as alternative link extractor (#480)
html5gum is a HTML parser that offers lower-level control over which tokens actually get created and are tracked. As such, the extractor doesn't allocate anything tokens it doesn't care about. On some benchmarks it provides a substantial performance boost. The old parser, html5ever is still available by setting the `LYCHEE_USE_HTML5EVER=1` env var.
2022-02-07 22:54:47 +01:00
dependabot[bot]
1f3abce671
Bump pretty_assertions from 1.0.0 to 1.1.0 (#487)
Bumps [pretty_assertions](https://github.com/colin-kiegel/rust-pretty-assertions) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/colin-kiegel/rust-pretty-assertions/releases)
- [Changelog](https://github.com/colin-kiegel/rust-pretty-assertions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/colin-kiegel/rust-pretty-assertions/compare/v1.0.0...v1.1.0)

---
updated-dependencies:
- dependency-name: pretty_assertions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-01 13:42:58 +01:00
dependabot[bot]
a8d1359df4
Bump tokio from 1.15.0 to 1.16.1 (#482)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.15.0 to 1.16.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.15.0...tokio-1.16.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-28 13:23:02 +01:00
Matthias
6635863746
Add Alpine page for benchmark; refactor code (#481) 2022-01-27 23:42:06 +01:00
dependabot[bot]
8e31b234d3
Bump check-if-email-exists from 0.8.25 to 0.8.26 (#479)
Bumps [check-if-email-exists](https://github.com/reacherhq/check-if-email-exists) from 0.8.25 to 0.8.26.
- [Release notes](https://github.com/reacherhq/check-if-email-exists/releases)
- [Changelog](https://github.com/reacherhq/check-if-email-exists/blob/master/CHANGELOG.md)
- [Commits](https://github.com/reacherhq/check-if-email-exists/compare/v0.8.25...v0.8.26)

---
updated-dependencies:
- dependency-name: check-if-email-exists
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-27 15:02:05 +01:00
Matthias
97b06230fc
Add missing Github exclusions; sort entries (#473) 2022-01-21 23:54:59 +01:00
dependabot[bot]
84eff209ff
Bump cached from 0.29.0 to 0.30.0 (#472)
Bumps [cached](https://github.com/jaemk/cached) from 0.29.0 to 0.30.0.
- [Release notes](https://github.com/jaemk/cached/releases)
- [Changelog](https://github.com/jaemk/cached/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jaemk/cached/commits)

---
updated-dependencies:
- dependency-name: cached
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-21 21:50:20 +01:00
dependabot[bot]
9082865e24
Bump pulldown-cmark from 0.9.0 to 0.9.1 (#468)
Bumps [pulldown-cmark](https://github.com/raphlinus/pulldown-cmark) from 0.9.0 to 0.9.1.
- [Release notes](https://github.com/raphlinus/pulldown-cmark/releases)
- [Commits](https://github.com/raphlinus/pulldown-cmark/compare/v0.9.0...v0.9.1)

---
updated-dependencies:
- dependency-name: pulldown-cmark
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-18 14:44:45 +01:00
Matthias
5802ae912c
Fix bugs in extractor; reduce allocs (#464)
When URLs couldn't be extracted from a tag,
we ran a plaintext search, but never added the
newly found urls to the vec of extracted urls.

Also tried to make the code a little more idiomatic
2022-01-16 02:13:38 +01:00
Matthias
6e757fa20e
Add more information about mail errors (#463) 2022-01-14 22:22:53 +01:00
Matthias
994aadf6a1
Simplify error messages (#462)
Using pattern matching to make the hubcaps and reqwest error messages a little shorter and (subjectively) more readable.
2022-01-14 15:26:13 +01:00
Matthias
ac490f9c53
Add caching functionality (v2) (#443)
A while ago, caching was removed due to some issues (see #349).
This is a new implementation with the following improvements:

 * Architecture: The new implementation is decoupled from the collector, which was a major issue in the last version.    Now the collector has a single responsibility: collecting links. This also avoids race-conditions when running multiple collect_links instances, which probably was an issue before.
* Performance: Uses DashMap under the hood, which was noticeably faster than Mutex<HashMap> in my tests.
* Simplicity: The cache format is a CSV file with two columns: URI and status. I decided to create a new struct called CacheStatus for serialization, because trying to serialize the error kinds in Status turned out to be a bit of a nightmare and at this point I don't think it's worth the pain (and probably isn't idiomatic either).

This is an optional feature. Caching only gets used if the `--cache` flag is set.
2022-01-14 15:25:51 +01:00
dependabot[bot]
80fb20cca5 Bump cached from 0.28.0 to 0.29.0
Bumps [cached](https://github.com/jaemk/cached) from 0.28.0 to 0.29.0.
- [Release notes](https://github.com/jaemk/cached/releases)
- [Changelog](https://github.com/jaemk/cached/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jaemk/cached/commits)

---
updated-dependencies:
- dependency-name: cached
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-13 13:38:48 +01:00
dependabot[bot]
a0d34a04f5 Bump cached from 0.26.2 to 0.28.0
Bumps [cached](https://github.com/jaemk/cached) from 0.26.2 to 0.28.0.
- [Release notes](https://github.com/jaemk/cached/releases)
- [Changelog](https://github.com/jaemk/cached/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jaemk/cached/commits)

---
updated-dependencies:
- dependency-name: cached
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-12 13:25:43 +01:00
Matthias
1e76e82811 Add test for nonexistent Github file 2022-01-12 09:25:12 +01:00
Matthias
48c8153e11 Refactor Github checking; add docs 2022-01-12 09:25:12 +01:00
Matthias
50d7b05736 Conditionally compile constructors for GithubUri for tests 2022-01-12 09:25:12 +01:00
Matthias
8d445a3a4b Be more permissive around private GH repos
The Github API doesn't handle checking individual files inside repos or
paths like `github.com/org/repo/issues`, so we are more
permissive and only check for repo existence. This is the
only way to get a basic check for private repos. Public repos are not affected and should work
with a normal check.
2022-01-12 09:25:12 +01:00
Matthias
e91c0c60f0 Only accept two path segments (org/repo) for Github API check 2022-01-12 09:25:12 +01:00
Matthias
7667842bb6 Strip .git suffix from Github URLs (#384) 2022-01-12 09:25:12 +01:00
dependabot[bot]
5a5ed00ba4
Bump reqwest from 0.11.8 to 0.11.9 (#455)
Bumps [reqwest](https://github.com/seanmonstar/reqwest) from 0.11.8 to 0.11.9.
- [Release notes](https://github.com/seanmonstar/reqwest/releases)
- [Changelog](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/seanmonstar/reqwest/compare/v0.11.8...v0.11.9)

---
updated-dependencies:
- dependency-name: reqwest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-11 13:43:35 +01:00
Matthias
36450621fa
Update dependencies (#454) 2022-01-10 22:35:37 +01:00
dependabot[bot]
6b7671b97c
Bump wiremock from 0.5.9 to 0.5.10 (#451)
Bumps [wiremock](https://github.com/LukeMathWalker/wiremock-rs) from 0.5.9 to 0.5.10.
- [Release notes](https://github.com/LukeMathWalker/wiremock-rs/releases)
- [Changelog](https://github.com/LukeMathWalker/wiremock-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/LukeMathWalker/wiremock-rs/compare/v0.5.9...v0.5.10)

---
updated-dependencies:
- dependency-name: wiremock
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-10 18:15:58 +01:00
Matthias
0645177b84
Bump version (#450) 2022-01-10 01:38:46 +01:00
Matthias
21f3160b71
Make retries configurable; align constants (#446)
Using the same default values for the library and the
binary now but tweaked the values a bit for slightly faster performance.
2022-01-07 01:03:10 +01:00
Matthias
388bbbe7b0
Exclude known false-positives from Github API check (#445)
Fixes https://github.com/lycheeverse/lychee/issues/431
2022-01-06 00:33:53 +01:00
dependabot[bot]
f515d096db
Bump wiremock from 0.5.8 to 0.5.9 (#442)
Bumps [wiremock](https://github.com/LukeMathWalker/wiremock-rs) from 0.5.8 to 0.5.9.
- [Release notes](https://github.com/LukeMathWalker/wiremock-rs/releases)
- [Changelog](https://github.com/LukeMathWalker/wiremock-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/LukeMathWalker/wiremock-rs/compare/v0.5.8...v0.5.9)

---
updated-dependencies:
- dependency-name: wiremock
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-05 15:18:08 +01:00
Matthias
dd48466d9a
Add missing test for local links in plaintext files (#444) 2022-01-05 12:51:14 +01:00
dependabot[bot]
7a4de16138
Bump http from 0.2.5 to 0.2.6 (#438)
Bumps [http](https://github.com/hyperium/http) from 0.2.5 to 0.2.6.
- [Release notes](https://github.com/hyperium/http/releases)
- [Changelog](https://github.com/hyperium/http/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/http/compare/v0.2.5...v0.2.6)

---
updated-dependencies:
- dependency-name: http
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-12-31 18:33:22 +01:00
dependabot[bot]
c0b7205a71
Bump pulldown-cmark from 0.8.0 to 0.9.0 (#433)
Bumps [pulldown-cmark](https://github.com/raphlinus/pulldown-cmark) from 0.8.0 to 0.9.0.
- [Release notes](https://github.com/raphlinus/pulldown-cmark/releases)
- [Commits](https://github.com/raphlinus/pulldown-cmark/compare/v0.8.0...v0.9.0)

---
updated-dependencies:
- dependency-name: pulldown-cmark
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-12-23 13:49:54 +01:00
dependabot[bot]
147fa8de87
Bump reqwest from 0.11.7 to 0.11.8 (#432)
Bumps [reqwest](https://github.com/seanmonstar/reqwest) from 0.11.7 to 0.11.8.
- [Release notes](https://github.com/seanmonstar/reqwest/releases)
- [Changelog](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/seanmonstar/reqwest/compare/v0.11.7...v0.11.8)

---
updated-dependencies:
- dependency-name: reqwest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-12-21 16:28:22 +01:00
dependabot[bot]
b2bc0e7eac
Bump futures from 0.3.18 to 0.3.19 (#430)
Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.18 to 0.3.19.
- [Release notes](https://github.com/rust-lang/futures-rs/releases)
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.18...0.3.19)

---
updated-dependencies:
- dependency-name: futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-12-20 15:37:18 +01:00
Matthias
01393b34a2
Upgrade to Rust 2021 (#427) 2021-12-17 01:32:13 +01:00
Matthias
83182c29ca
Fix JSON serialization (#426)
We recently removed the custom serialization for InputSource.
This causes the JSON formatter to fail
with "key must be a string".
Add it back and add a comment on
why this is needed.
2021-12-16 23:55:04 +01:00
dependabot[bot]
58785311f0
Bump tokio from 1.14.0 to 1.15.0 (#425)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.14.0 to 1.15.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.14.0...tokio-1.15.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-12-16 19:32:51 +01:00
Matthias
166c86c30e
Use tokenizer for extraction; add benchmark (#424)
This avoids creating a DOM tree for link extraction and instead uses a `TokenSink` for on-the-fly extraction. In hyperfine benchmarks it was about 10-25% faster than the master.

Old: 4.557 s ± 0.404 s
New: 3.832 s ± 0.131 s

The performance fluctuates a little less as well.

Some missing element/attribute pairs were also added, which contain links according to the HTML spec. These occur very rarely, but it's good to parse them for completeness' sake.

Furthermore tried to clean up a lot of papercuts around our types. We now differentiate between a `RawUri` (stringy-types) and a Uri, which is a properly parsed `URI` type.
The extractor now only deals with extracting `RawUri`s while the collector creates the request objects.
2021-12-16 18:45:52 +01:00
dependabot[bot]
c97ff95575
Bump once_cell from 1.8.0 to 1.9.0 (#423)
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.8.0 to 1.9.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.8.0...v1.9.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-12-15 13:45:19 +01:00
dependabot[bot]
eac9d5b9a0
Bump openssl-sys from 0.9.71 to 0.9.72 (#421)
Bumps [openssl-sys](https://github.com/sfackler/rust-openssl) from 0.9.71 to 0.9.72.
- [Release notes](https://github.com/sfackler/rust-openssl/releases)
- [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-sys-v0.9.71...openssl-sys-v0.9.72)

---
updated-dependencies:
- dependency-name: openssl-sys
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-12-13 13:26:15 +01:00
Matthias
c41ba64a69
Max concurrency moved to check (#419)
Concurrency is defined by the channel size consuming
from the request stream in  `check`
2021-12-07 11:52:40 +01:00