This is a very conservative and limited implementation of cookie support.
The goal is to ship an MVP, which covers 80% of the use-cases.
When you run lychee with --cookie-jar cookies.json, all cookies will be stored in cookies.json, one cookie per line.
This makes cookies easy to edit by hand if needed, although this is an advanced use-case and the API for the format is not guaranteed to be stable.
Fixes: #645, #715
Partially fixes: #1108
* Add support for basic auth per domain
* Move URI matching to link collection phase
* Allow AsRef for BasicAuthExtractor::new to avoid clone
* Add tests
---------
Co-authored-by: Matthias Endler <matthias@endler.dev>
* Add optional Rustls support
This commit adds a non-default feature flag to use Rustls instead of OpenSSL.
My personal motivation is to use Lychee on OpenBSD -current, where the
`openssl` crate frequently fails to link against the unreleased system
LibreSSL. Using the `vendored-openssl` feature helps with compilation, but
segfaults at runtime.
The commit adds three feature flags to the library, binary, benchmark, and all
examples:
- The `native-tls` feature flag toggles the `openssl` crate.
- The `rustls-tls` feature flag toggles the `rustls` crate.
- The `email-check` feature flag toggles the `check-if-email-exists` crate,
which is the only existing functionality currently incompatible with Rustls.
By default, `native-tls` and `email-check` are enabled. Thus, Lychee (bin and
lib) can be used as before unless default features are disabled.
To use the Rustls feature, pass `--no-default-features --features rustls` to
cargo check/build/test/..., e.g.,
$ cargo clippy --workspace --all-targets --no-default-features \ --features
rustls-tls -- --deny warnings
Checking email addresses requires both, `native-tls` and `email-check`, to be
enabled. Otherwise, email addresses are excluded.
The `email-check` feature flag is technically not necessary. I preferred it
over `not(rustls-tls)` because it's clearer and it addresses the AGPL license
issue #594. As far as I understand, a Lychee binary compiled without the
`email-check` feature could be distributed with file-based copyleft for the
MPL-licensed dependencies only. But that's out of scope here.
The benchmark shows a performance regression varying between 2% and 4.4% when
using Rustls instead of OpenSSL on my machine.
PS: The `ring` crate needs to be patched on OpenBSD 7.3 and later until the new
xonly patches have been upstreamed, see the `rust-ring` port.
* Use platform native certificates with Rustls
By default, reqwest uses the webpki-roots crate with Rustls, effectively
bundling Mozilla's root certificates.
This commit uses the rustls-native-certs crate instead to use locally
installed root certificates, to minimize the difference between the
native-tls and rustls-tls features.
* Document feature flags
Unknown status codes should be skipped and not cached by default. The reason is that we don't know if they are valid or not and even if they are invalid, we don't know if they will be valid in the future.
`lychee_lib::client`:
- Improved documentation.
- Added an log message in `ClientBuilder::client()` when provied user-agent
overrides the one defined in provied custom header.
- Removed unnecessary error handling in `Client::check()` when setting HTTPS
scheme because all failure cases should occur when checking this URL the first
time already.
- Removed unnecessary error handling in `Client::remap()` since
`lychee-lib::remap::Remaps::remap()` doesn't returns a `Result` anymore.
- Fixed potential integer overflow in `Client::check_website()` when the wait
time between retries doubles, by using `std::time::Duration::saturating_mul`
instead.
- Renamed `invalid()` to `validate_url()`.
`lychee_lib::remap`:
- Improved documentation, in particular, clarified (in the comment) that it's
URLs not URIs being remapped.
- Changed `Remaps::remap()` so it takes `&mut Url` instead of `Uri` as its
argument, and doesn't return a `Result` as a result.
- Using `Url` instead of `Uri` because it aligns with the concept of
remapping locations rather than identifiers.
- Mutating the URL directly instead of returning a new one for it's more
straightforward.
- There is no error handling because we don't convert from URL to URI
anymore. Furthermore, this always succeed in the first place so we never
needed error handling.
- Added implementation of `IntoIterator` for `&'a Remaps` and convenience method
of `Remaps::iter`. (Their mutable or moving counterparts are deliberately
avoided because we don't want library users to modify all consume the
remapping rules after its instantiation.)
`lychee_lib::error`:
- Renamed `ErrorKind::InvalidUriRemap` to `InvalidUrlRemap` and improved
its error message.
Changes to other modules are minor and only serves to accompany aforementioned
changes.
* Bump indicatif from 0.16.2 to 0.17.0
Bumps [indicatif](https://github.com/console-rs/indicatif) from 0.16.2 to 0.17.0.
- [Release notes](https://github.com/console-rs/indicatif/releases)
- [Commits](https://github.com/console-rs/indicatif/compare/0.16.2...0.17.0)
---
updated-dependencies:
- dependency-name: indicatif
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* Update progress bar setup
* Change progress bar style
* Use pink for spinner
* Show ETA instead of elapsed
* dim progress bar and adjust size to terminal width
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Previously the cache age was formatted with nanosecond resolution,
which is too fine-grained even for Rustaceans.
Now the format is limited to days, hours, minutes, and seconds.
With that, the cache age becomes more easily parseable by humans.
As discussed in https://github.com/lycheeverse/lychee/issues/647#issuecomment-1170773449, it does not make much sense to cache unsupported
and excluded URLs.
Unsupported URLs might be supported in the future and caching them
would mean they won't get checked then. Excluded URLs were
excluded for a reason and should not appear in the cache.
Furthermore they might not be excluded
in a consecutive run, leading to a false-positive.