Commit graph

265 commits

Author SHA1 Message Date
Thomas Zahner
e3a236b257 Adjust documentation 2024-04-22 11:03:26 +02:00
Matthias
9b4fd8d0fc Extend docs around clone_unwrap 2024-04-22 11:03:26 +02:00
Matthias
9ed97213a1 Add documentation to chain module
Also make `Chainable` and  `ChainResult` public to support external plugins/handlers.
2024-04-22 11:03:26 +02:00
Thomas Zahner
d5b9b84db6 Extract function and add SAFETY note 2024-04-22 11:03:26 +02:00
Thomas Zahner
a3184190b8 Small tweaks & extract method 2024-04-22 11:03:26 +02:00
Thomas Zahner
93afae54bb Create ClientRequestChain helper structure to combine multiple chains 2024-04-22 11:03:26 +02:00
Thomas Zahner
41e7f88da4 Add credentials to chain 2024-04-22 11:03:26 +02:00
Thomas Zahner
3ec3a8228c Make checker part of the request chain 2024-04-22 11:03:26 +02:00
Matthias Endler
0f012f3035 Use async_trait to fix issues with Chain type inference 2024-04-22 11:03:26 +02:00
Thomas Zahner
d92d3ba733 Extract checking functionality & make chain async 2024-04-22 11:03:26 +02:00
Thomas Zahner
17e2911700 Move Arc and Mutex inside of Chain struct 2024-04-22 11:03:26 +02:00
Thomas Zahner
377aceed60 Apply clippy suggestions 2024-04-22 11:03:26 +02:00
Thomas Zahner
c41cd5d6b9 Apply suggestions 2024-04-22 11:03:26 +02:00
Thomas Zahner
1fe6f2f1be Small improvements 2024-04-22 11:03:26 +02:00
Thomas Zahner
a90a35c329 Add doc comment 2024-04-22 11:03:26 +02:00
Thomas Zahner
402482ca01 Update RequestChain & add chain to client 2024-04-22 11:03:26 +02:00
Thomas Zahner
667105e13e Move chain into check_website function 2024-04-22 11:03:26 +02:00
Thomas Zahner
db1dc19c0a Implement Chainable directly for BasicAuthCredentials 2024-04-22 11:03:26 +02:00
Thomas Zahner
8da4592e2a Introduce early exit in chain 2024-04-22 11:03:26 +02:00
Thomas Zahner
7783cdfe46 Pass down request_chain instead of credentials & add test 2024-04-22 11:03:26 +02:00
Thomas Zahner
237f690f18 Add quirks to request chain 2024-04-22 11:03:26 +02:00
Thomas Zahner
4d8a21c2dd Test chain 2024-04-22 11:03:26 +02:00
Thomas Zahner
0c4bea8074 Create request chain with basic authentication 2024-04-22 11:03:26 +02:00
Denis
0d5601ce9e
fix: Treat sites with 403 status codes as broken links (#1377) 2024-03-19 12:55:21 +01:00
Hugo McNally
9ff4a838ce
Fixed fragment generation for headings with inline code (#1370)
* Added code headings to fragment cli test

* Fixed fragment generation for headings with inline code
2024-02-05 01:07:56 +01:00
Nai Hao Cheng
a7c36b5255
Set URI to https for ErrorKind::InsecureURL (#1369) 2024-02-01 12:03:30 +01:00
Norbert Kamiński
2a95944ef5
status.rs: Make json output more verbose (#1367)
* status.rs: Make json output more verbose

Currently if the status response has no status code, json output
contains only a text field which gives no real information about
the cause of the problem. The patch adds field with more detailed
information when the status response contains some details.

Signed-off-by: Norbert Kamiński <norbert.kaminski@3mdeb.com>

* cli.rs: Test parsing of error details in JSON format

Some network error such as SSL has no status code but it can be
identified by error status details. This patch adds a test case to
verify if the error details are parsed properly in the json format.

Signed-off-by: Norbert Kamiński <norbert.kaminski@3mdeb.com>

---------

Signed-off-by: Norbert Kamiński <norbert.kaminski@3mdeb.com>
2024-01-30 23:58:18 +01:00
Techassi
1ab05efdcb
feat: Expand serde deserialize impl (#1345) 2024-01-10 00:07:39 +01:00
Matthias Endler
63ba63f7c9
Exclude example TLDs from RFC 2606 (#1335)
Fixes https://github.com/lycheeverse/lychee/issues/1283
2024-01-05 18:48:15 +01:00
Hugo McNally
c9b707ea74
Decode percent escapes in fragments (#1275)
* Added test to check a fragment with a utf8 character
2024-01-05 15:46:09 +01:00
Matthias Endler
d3d0cd513d
Better TOML parsing error message (#1332)
The error handling for config loading was pretty poor.
That's because we didn't use the correct syntax to show the entire context with `anhow`.
See ["Display representations"](https://docs.rs/anyhow/latest/anyhow/struct.Error.html#display-representations).
2024-01-04 22:17:14 +01:00
Matthias Endler
ef4e19268a
Fix false-positive example domains (#1316) 2023-12-04 01:55:14 +01:00
Thomas Zahner
46f0ae908e
Address warnings of the new clippy lints (#1310) 2023-12-01 14:21:49 +01:00
A.J. Stein
2109470dc3
Exclude URLs ending with xmlrpc.php (#1262)
* Add to xmlrpc.php to exclude patterns for lycheeverse/lychee#1210

* Fix addition to array of exclude pattern array for #1210 for lint checks
2023-10-06 21:31:44 +02:00
Techassi
1b1fd0c707
feat: Add support for ranges in the --accept option / config field (#1167)
Adds support for accept ranges discussed in #1157. This allows the user to specify custom HTTP status codes accepted during checking and thus will report as valid (not broken). The accept option only supports specifying status codes as a comma-separated list. With this PR, the option will accept a list of status code ranges formatted like this:

```toml
accept = ["100..=103", "200..=299", "403"]
```

These combinations will be supported: `..<end>`, ` ..=<end>`, `<start>..<end>` and `<start>..=<end>`.
The behavior is copied from the Rust Range like concepts:

```
    ..<end>, includes 0 to <end> (exclusive)
    ..=<end>, includes 0 to <end> (inclusive)
    <start>..<end>, includes <start> to <end> (exclusive)
    <start>..=<end>, includes <start> to <end> (inclusive)
```


- Foundation and enhancements for accept ranges, including support for comma-separated strings and integration into the CLI.
- Implementations and updates for AcceptSelector, including Default, Display, and serde defaults.
- Address and fix various errors: clippy, cargo fmt, and tests.
- Add more tests, address edge cases, and enhance error messaging, especially for TOML config parsing.
- Update dependencies.
2023-09-17 21:39:01 +02:00
dependabot[bot]
fbb77d7f0e
Bump the dependencies group with 10 updates (#1249)
* Bump the dependencies group with 10 updates

Bumps the dependencies group with 10 updates:

| Package | From | To |
| --- | --- | --- |
| [clap](https://github.com/clap-rs/clap) | `4.3.23` | `4.4.2` |
| [dashmap](https://github.com/xacrimon/dashmap) | `5.5.1` | `5.5.3` |
| [openssl-sys](https://github.com/sfackler/rust-openssl) | `0.9.91` | `0.9.92` |
| [regex](https://github.com/rust-lang/regex) | `1.9.3` | `1.9.5` |
| [reqwest](https://github.com/seanmonstar/reqwest) | `0.11.19` | `0.11.20` |
| [octocrab](https://github.com/XAMPPRocky/octocrab) | `0.29.3` | `0.30.1` |
| [thiserror](https://github.com/dtolnay/thiserror) | `1.0.47` | `1.0.48` |
| [typed-builder](https://github.com/idanarye/rust-typed-builder) | `0.15.2` | `0.16.0` |
| [url](https://github.com/servo/rust-url) | `2.4.0` | `2.4.1` |
| [criterion](https://github.com/bheisler/criterion.rs) | ``4c19e91`` | ``180f4b4`` |


Updates `clap` from 4.3.23 to 4.4.2
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.3.23...v4.4.2)

Updates `dashmap` from 5.5.1 to 5.5.3
- [Release notes](https://github.com/xacrimon/dashmap/releases)
- [Commits](https://github.com/xacrimon/dashmap/compare/v5.5.1...v.5.5.3)

Updates `openssl-sys` from 0.9.91 to 0.9.92
- [Release notes](https://github.com/sfackler/rust-openssl/releases)
- [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-sys-v0.9.91...openssl-sys-v0.9.92)

Updates `regex` from 1.9.3 to 1.9.5
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.9.3...1.9.5)

Updates `reqwest` from 0.11.19 to 0.11.20
- [Release notes](https://github.com/seanmonstar/reqwest/releases)
- [Changelog](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/seanmonstar/reqwest/compare/v0.11.19...v0.11.20)

Updates `octocrab` from 0.29.3 to 0.30.1
- [Release notes](https://github.com/XAMPPRocky/octocrab/releases)
- [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md)
- [Commits](https://github.com/XAMPPRocky/octocrab/compare/v0.29.3...v0.30.1)

Updates `thiserror` from 1.0.47 to 1.0.48
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.47...1.0.48)

Updates `typed-builder` from 0.15.2 to 0.16.0
- [Changelog](https://github.com/idanarye/rust-typed-builder/blob/master/CHANGELOG.md)
- [Commits](https://github.com/idanarye/rust-typed-builder/commits)

Updates `url` from 2.4.0 to 2.4.1
- [Release notes](https://github.com/servo/rust-url/releases)
- [Commits](https://github.com/servo/rust-url/compare/v2.4.0...v2.4.1)

Updates `criterion` from `4c19e91` to `180f4b4`
- [Commits](4c19e913b8...180f4b4896)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: dashmap
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: openssl-sys
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: reqwest
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: octocrab
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: typed-builder
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: url
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: criterion
  dependency-type: direct:production
  dependency-group: dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* Format

* Fix new clippy lints

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2023-09-05 10:50:45 +02:00
Hugo McNally
f59aa61ee3
Check fragments in HTML files (#1198)
* Added html5gum based fragment extractor
* Markdown fragment extractor now extracts fragments from inline html
* Added fragment checks for html file
* Added inline html and html document to fragment checks test
* Improved some comments
* Improved documentation of markdown's fragment extractor.
2023-08-22 16:44:45 +02:00
Viktor Szépe
07466c0bfd
Fix typos (#1231) 2023-08-19 14:49:36 +02:00
Stefan Kreutz
b1b32e7717
Fix rustls-tls feature (#1194)
* Fix rustls-tls feature

Commit 14e74879 (cookie support #1146) re-introduced an unconditional
dependency on the openssl-sys crate. That is, building Lychee with the
Rustls TLS backend now requires OpenSSL. I suppose this change was
unintended, maybe due to automatic conflict resolution. If not, please
let me know.

You can review the re-introduced dependency like so:

```
cargo tree --no-default-features --features rustls-tls -i openssl-sys
```

This commit puts the OpenSSL dependency behind the native-tls feature
flag again.

You can check the TLS features like so:

```
cargo check --workspace --all-targets --features vendored-openssl

cargo check --workspace --all-targets --all-features

cargo check --workspace --all-targets --no-default-features --features rustls-tls
```

Maybe this should be added to CI. But I don't want to waste anybody's
time.

* Check feature flags during CI

Adds a new CI job 'check-feature-flags' to verify the following:

- Lychee with rustls-tls feature only doesn't depend on OpenSSL
- Cargo check passes with default features
- Cargo check passes with all features
- Cargo check passes with rustls-tls feature only
2023-08-04 15:11:29 +02:00
Hugo McNally
8e6369377c
Introduce fragment checking for links to markdown files. (#1126)
- Implemented enhancements to include fragments in file links
- Checked links to markdown files with fragments, generating unique kebab case and heading attributes.
- Made code more idiomatic and added an integration test.
- Updated documentation.
- Fixed issues with heading attributes fragments and ensured proper handling of file errors.
2023-07-31 16:04:00 +02:00
Matthias Endler
cead4ce826
Improve srcset parsing (#1160)
Our current `srcset` parsing is pretty basic.

We split on comma and then on whitespace and take the first part, which is the image source URL.
However, we don't handle URLs containing unencoded commas like
</cdn-cgi/image/format=webp,width=640/https://img.youtube.com/vi/hVBl8_pgQf0/maxresdefault.jpg>, which leads to false-positives.

According to the spec, commas in strings should be encoded, but in practice, there are some websites which don't do that. To handle these cases, too, I propose to extend the `srcset` parsing to make use of a small "state machine", which detects if a comma is within the image source or outside of it while parsing.

This is part of an effort to reduce false-positives during link checking.

---------
Co-authored-by: Hugo McNally <45573837+HU90m@users.noreply.github.com>
2023-07-29 17:06:44 +02:00
Markus Unterwaditzer
434baa8420
bump html5gum to 0.5.7 (#1182)
more performance improvements, also i accidentally pushed a breaking
change in 0.5.7 and want to avoid 0.5.5 to circulate around too much
because of that
2023-07-27 14:25:10 +02:00
Markus Unterwaditzer
566fca4c3d
bump html5gum to 0.5.5 (#1181)
there's some perf improvements there, but one requires manual
configuration.

custom should_emit_errors, when inlined, eliminates some useless code
that just wastes time when we don't care about errors.
2023-07-26 01:31:44 +02:00
Matthias Endler
04887ee293
Make checking email addresses optional (#1171)
E-Mail checks cause too many false-postives,
so we put them behind a flag.

* `--exclude-mail` is deprecated (to be removed in 1.0)
* `--include-mail` is the new flag

This PR also removes the obsolete tests for `--exclude-file`, which was superseded by `.lycheeignore`.

Fixes #1089
2023-07-19 19:58:38 +02:00
Techassi
f53619a455
feat: Add support for --dump-inputs (#1159)
* Add support for --dump-inputs
* Add integration tests
* Fix usage guide in README
2023-07-16 18:08:14 +02:00
Matthias
961575cdc7 fix typos 2023-07-13 21:48:46 +02:00
Matthias Endler
14e748793e
Cookie Support (#1146)
This is a very conservative and limited implementation of cookie support.

The goal is to ship an MVP, which covers 80% of the use-cases.
When you run lychee with --cookie-jar cookies.json, all cookies will be stored in cookies.json, one cookie per line.
This makes cookies easy to edit by hand if needed, although this is an advanced use-case and the API for the format is not guaranteed to be stable.

Fixes: #645, #715
Partially fixes: #1108
2023-07-13 17:32:41 +02:00
Matthias Endler
40ba18794d
Don't check Twitter URLs (#1147)
Twitter completely locked down and requires
a login to read tweets. (Temporarily) disable all
Twitter URLs to avoid false-positives.

For context:
https://github.com/zedeus/nitter/issues/919
https://news.ycombinator.com/item?id=36540957
https://techcrunch.com/2023/06/30/twitter-now-requires-an-account-to-view-tweets/

Fixes https://github.com/lycheeverse/lychee/issues/1108
2023-07-13 17:31:59 +02:00
Matthias Endler
97573123ef
Extend remap feature (#1133)
* wip

* Extend support for remapping

This adds supports for partial remaps and
capture groups to the remap feature.

Fixes #1129
2023-07-05 15:05:19 +02:00
Matthias Endler
15e420b8ad
Avoid false positives when checking email addresses in HTML input (#1123)
Skip email addresses outside href attributes in HTML
2023-07-01 00:12:11 +02:00
Techassi
67af7ef6d3
feat: add support for basic auth per URI (#1110)
* Add support for basic auth per domain
* Move URI matching to link collection phase
* Allow AsRef for BasicAuthExtractor::new to avoid clone
* Add tests

---------

Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-06-26 12:06:24 +02:00
Matthias Endler
58c07e495e
Update false-positive patterns (#1120) 2023-06-25 15:09:09 +02:00
Matthias Endler
f0af985aac
Log redirects in verbose mode (-vv) (#1117)
This adds a custom redirect policy,
which logs redirects as debug messages.
It can help with troubleshooting, e.g. in situations like
https://github.com/lycheeverse/lychee/issues/1115
2023-06-23 15:49:05 +02:00
Stefan Kreutz
7dd84f6b7c
Add optional Rustls support (#1099)
* Add optional Rustls support

This commit adds a non-default feature flag to use Rustls instead of OpenSSL.

My personal motivation is to use Lychee on OpenBSD -current, where the
`openssl` crate frequently fails to link against the unreleased system
LibreSSL. Using the `vendored-openssl` feature helps with compilation, but
segfaults at runtime.

The commit adds three feature flags to the library, binary, benchmark, and all
examples:

- The `native-tls` feature flag toggles the `openssl` crate.
- The `rustls-tls` feature flag toggles the `rustls` crate.
- The `email-check` feature flag toggles the `check-if-email-exists` crate,
  which is the only existing functionality currently incompatible with Rustls.

By default, `native-tls` and `email-check` are enabled. Thus, Lychee (bin and
lib) can be used as before unless default features are disabled.

To use the Rustls feature, pass `--no-default-features --features rustls` to
cargo check/build/test/..., e.g.,

    $ cargo clippy --workspace --all-targets --no-default-features \ --features
    rustls-tls -- --deny warnings

Checking email addresses requires both, `native-tls` and `email-check`, to be
enabled. Otherwise, email addresses are excluded.

The `email-check` feature flag is technically not necessary. I preferred it
over `not(rustls-tls)` because it's clearer and it addresses the AGPL license
issue #594. As far as I understand, a Lychee binary compiled without the
`email-check` feature could be distributed with file-based copyleft for the
MPL-licensed dependencies only. But that's out of scope here.

The benchmark shows a performance regression varying between 2% and 4.4% when
using Rustls instead of OpenSSL on my machine.

PS: The `ring` crate needs to be patched on OpenBSD 7.3 and later until the new
xonly patches have been upstreamed, see the `rust-ring` port.

* Use platform native certificates with Rustls

By default, reqwest uses the webpki-roots crate with Rustls, effectively
bundling Mozilla's root certificates.

This commit uses the rustls-native-certs crate instead to use locally
installed root certificates, to minimize the difference between the
native-tls and rustls-tls features.

* Document feature flags
2023-06-16 02:21:57 +02:00
Matthias Endler
5ce77e1202
Don't cache unknown status codes (#1090)
Unknown status codes should be skipped and not cached by default. The reason is that we don't know if they are valid or not and even if they are invalid, we don't know if they will be valid in the future.
2023-06-02 02:46:20 +02:00
Levi Zim
9b0a06e1a9
test(client): make exponential_backoff better (#1079)
This test is still flaky on riscv64 boards after #1049.

It turns out that building the client might take 59ms,
which should not be counted.
2023-05-26 13:32:28 +02:00
Matthias Endler
fe24ba783a
Add check duration (in seconds) to report (#1064) 2023-05-06 00:47:32 +02:00
Levi Zim
436a235f4b
perform a warm up request in test_exponential_backoff (#1049)
Perform a warm-up request to ensure the lazy regexes in
`lychee-lib/src/quirks/mod.rs` are compiled.
On some platforms, this can take some time(approx. 110ms),
which should not be counted in the test.
2023-04-21 22:27:38 +02:00
dependabot[bot]
6a72f81535
Bump octocrab from 0.19.0 to 0.20.0 (#1045)
* Bump octocrab from 0.19.0 to 0.20.0

Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.19.0 to 0.20.0.
- [Release notes](https://github.com/XAMPPRocky/octocrab/releases)
- [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md)
- [Commits](https://github.com/XAMPPRocky/octocrab/compare/octocrab@0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: octocrab
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* impl RetryExt for http::Error

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2023-04-17 23:14:24 +02:00
Matthias
e47577708b formatting 2023-04-11 16:36:32 +02:00
Matthias
0c04f0371d Improve is_url function and tests 2023-04-11 01:00:25 +02:00
Matthias
e96d4114a9 helpers -> utils 2023-04-11 00:43:57 +02:00
Matthias
102480502c Lazy-load quirk patterns 2023-04-11 00:41:12 +02:00
Matthias
bbc0bfab9c format errors 2023-04-11 00:00:14 +02:00
Benny Joe Villiger
250f7a8f0a
Status codes in maps (#1014) 2023-03-27 12:29:12 +02:00
Matthias Endler
55797071b0
Fix nested URL extraction in verbatim elements (#988)
Skipping URLs in verbatim elements didn't take nested
elements into consideration, which were not verbatim.

For instance, the following HTML snippet would yield
`https://example.com` in non-verbatim mode, even if
it is nested inside a verbatim `<pre>` element:

```html
<pre><a href="https://example.com">link</a></pre>
```

This commit fixes the behavior for both `html5gum` and
`html5ever`.

Note that nested verbatim elements of the same kind
still are not handled correctly.

For instance,  the following HTML snippet would still yield
`https://example.com`:

```html
<pre>
  <pre></pre>
  <a href="https://example.com">link</a>
</pre>
```

The reason is that we currently only keep track of a single
verbatim element and not a stack of elements, which we
would need to unwind and resolve the situation.

Fixes https://github.com/lycheeverse/lychee/issues/986.
2023-03-11 15:18:25 +01:00
Matthias Endler
2255ad9286
Better retry handling (#981)
Previously, lychee would blindly retry all requests,
no matter if the request error was transient or fatal.

Taking a lesson from https://github.com/TrueLayer/reqwest-middleware,
we can be more granular about the error behavior.
This PR adds their retry logic to lychee, reducing the number of
unnecessary requests significantly.

I also made some ergonomic changes to the client, which should not
affect its behavior.
2023-03-10 22:36:45 +01:00
Matthias Endler
30e2a2b62b
Fix --max-redirects (#987)
Having more than the max number of redirects
caused lychee to abort the requests, but did not
lead to an error.

Related: https://github.com/lycheeverse/lychee-action/issues/164
2023-03-10 15:15:37 +01:00
Matthias
59ddc1e27d Fix url input handling without scheme 2023-03-03 12:13:09 +01:00
Matthias Endler
7874195bbb
Customize verbosity (#956) 2023-02-24 23:53:09 +01:00
Matthias Endler
b653a0a1ec
Fix cached 200 status code handling (#958)
* Fix cached 200 status code handling

Assert that code 200 never needs to be explicitly accepted for cached response
to match the behavior of uncached checks

* Bump version to v0.11.1
2023-02-23 00:25:53 +01:00
Matthias
5558531bab Fix lint 2023-02-22 21:05:49 +01:00
Kian-Meng Ang
9fa1d732f7
Fix typos (#944)
Found via `codespell -S fixtures -L crate,reacher,t`
2023-02-09 15:32:16 +01:00
dependabot[bot]
0a2cd324d5
Bump typed-builder from 0.11.0 to 0.12.0 (#934)
* Bump typed-builder from 0.11.0 to 0.12.0

Bumps [typed-builder](https://github.com/idanarye/rust-typed-builder) from 0.11.0 to 0.12.0.
- [Release notes](https://github.com/idanarye/rust-typed-builder/releases)
- [Changelog](https://github.com/idanarye/rust-typed-builder/blob/master/CHANGELOG.md)
- [Commits](https://github.com/idanarye/rust-typed-builder/commits)

---
updated-dependencies:
- dependency-name: typed-builder
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Remove custom builder method docs.

We use the default again, which offers the same amount of information.

* Add `make` target to show docs

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-01-30 15:12:20 +01:00
Matthias Endler
9837699b79
Introduce new let...else syntax (#936) 2023-01-30 14:25:30 +01:00
Lucius Hu
e2406089ad
chore!: improve client and remap modules (#913)
`lychee_lib::client`:

- Improved documentation.
- Added an log message in `ClientBuilder::client()` when provied user-agent
  overrides the one defined in provied custom header.
- Removed unnecessary error handling in `Client::check()` when setting HTTPS
  scheme because all failure cases should occur when checking this URL the first
  time already.
- Removed unnecessary error handling in `Client::remap()` since
  `lychee-lib::remap::Remaps::remap()` doesn't returns a `Result` anymore.
- Fixed potential integer overflow in `Client::check_website()` when the wait
  time between retries doubles, by using `std::time::Duration::saturating_mul`
  instead.
- Renamed `invalid()` to `validate_url()`.

`lychee_lib::remap`:

- Improved documentation, in particular, clarified (in the comment) that it's
  URLs not URIs being remapped.
- Changed `Remaps::remap()` so it takes `&mut Url` instead of `Uri` as its
  argument, and doesn't return a `Result` as a result.
    - Using `Url` instead of `Uri` because it aligns with the concept of
      remapping locations rather than identifiers.
    - Mutating the URL directly instead of returning a new one for it's more
      straightforward.
    - There is no error handling because we don't convert from URL to URI
      anymore. Furthermore, this always succeed in the first place so we never
      needed error handling.
- Added implementation of `IntoIterator` for `&'a Remaps` and convenience method
  of `Remaps::iter`. (Their mutable or moving counterparts are deliberately
  avoided because we don't want library users to modify all consume the
  remapping rules after its instantiation.)

`lychee_lib::error`:

- Renamed `ErrorKind::InvalidUriRemap` to `InvalidUrlRemap` and improved
  its error message.

Changes to other modules are minor and only serves to accompany aforementioned
changes.
2023-01-16 19:14:09 +01:00
Matthias Endler
b620fc99f7
Properly handle youtu.be shortlinks (#908)
Previously those were not correctly rewritten to thumbnail URLs.
This should be fixed now by splitting up the logic
for normal YouTube links and shortlinks.

Fixes #906
2023-01-06 18:25:09 +01:00
Matthias Endler
4a3bfb99fb
Remove address from verbatim elements (#901) 2023-01-05 14:55:53 +01:00
Matthias Endler
5654b7c317
Harden URL detection and extend verbatim elements (#899)
Previously remote URLs were incorrectly detected because the
string representation of a path is different than the path itself,
causing the `http` prefix match to be insufficient.

This resulted in unexpected side-effects, such as the
incorrect detection of verbatim mode for remote URLs.

The check now got improved and unit tests were added to avoid
future breakage. On top of that, missing verbatim elements were added
2023-01-04 00:38:19 +01:00
Matthias Endler
da46734c54
Extend response stats in verbose mode (#882) 2022-12-20 10:43:01 +01:00
Matthias Endler
6df1c378ec
Fix Rust 1.66 clippy lints (#879) 2022-12-19 14:28:10 +01:00
Matthias
7d435f2155
Add more markdown extensions (#866) 2022-12-12 18:26:42 +01:00
Matthias
ef391cea50
Recursively skip verbatim elements (#847) 2022-12-12 01:06:45 +01:00
Matthias
9eeea250cd
Exclude <script> tags by default (#848)
This is a naive approach to exclude script tags from
getting checked. The reason is that the tag leads to
a lot of false-positives (e.g. `//unpkg.com/docsify-edit-on-github@1`
within a script block gets detected as an e-mail address).

A more thorough approach would be the use of a tree-builder in
html5gum and html5ever, but this could have a negative performance
impact.

I also did not want to add a new flag (e.g. `--include-scripts`) for this
setting because the current set of flags around exclusion/inclusion is
already quite long.

Fixes #821.
2022-11-29 00:38:43 +01:00
Matthias
982d978e47
Add different verbosity levels (#824)
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
2022-11-28 23:25:33 +01:00
Matthias
b479a5810e
Allow overriding accepted status codes for cached URIs (#843)
Fixes #840
2022-11-28 12:23:07 +01:00
Matthias
765f7adb12
Don't check example mail addresses by default (#815)
This was an oversight so far that became apparent after our
recent fix for email addreses with query params
(e.g. `test@example.com?subject=test`).
The parsing of email addresses has improved and so we detect
more mail addresses, but we didn't check if they belonged
to an example domain, causing false-positive checks.
2022-11-08 23:46:32 +01:00
Matthias
d61105edbb
Fix parsing error of email addresses with query params (#809)
Email addresses with query parameters often get used in
contact forms on websites. They can also be found in
other documents like Markdown.

A common use-case is to add a subject line to the email
as a parameter e.g. `mailto:mail@example.com?subject="Hello"`.

Previously we handled such cases incorrectly by recognizing
them as files. The reason was that our email parsing was too strict
to allow for that use-case.
With `email_address` we switched to a more permissive parser.

Note that this does not affect the actual address email checking,
as this is still done `check-if-email-exists`, which has more strict
check functionality.
2022-11-05 23:40:33 +01:00
Matthias
94dda21326 Fix clippy lints 2022-09-27 18:17:37 +02:00
dependabot[bot]
226546091b
Bump check-if-email-exists from 0.8.31 to 0.9.0 (#735)
* Bump check-if-email-exists from 0.8.31 to 0.9.0

Bumps [check-if-email-exists](https://github.com/reacherhq/check-if-email-exists) from 0.8.31 to 0.9.0.
- [Release notes](https://github.com/reacherhq/check-if-email-exists/releases)
- [Changelog](https://github.com/reacherhq/check-if-email-exists/blob/master/CHANGELOG.md)
- [Commits](https://github.com/reacherhq/check-if-email-exists/compare/v0.8.31...v0.9.0)

---
updated-dependencies:
- dependency-name: check-if-email-exists
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update usage

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-08-16 12:35:34 +02:00
Matthias
6a49cedc16
Check Twitter URLs using nitter.net (#731)
Use an alternative Twitter frontend, which works more reliably than
using Twitter directly.
2022-08-12 22:46:35 +02:00
Matthias
69f387c1bd
Markdown-status (#729)
* Fix typos

* Add status code description to markdown output
2022-08-11 22:08:05 +02:00
Walter Beller-Morales
6d40a2ab7b
Update to gracefully handle nonexistent relative paths (#691)
* Update Input::new to gracefully handle nonexistent relative paths
* Add test checking Input::new can handle real relative paths
* Add better pre-conditions to Input::new tests
* Add integration tests for handling relative paths in lychee-bin
* Update lychee-lib/src/types/input.rs
2022-07-22 17:15:55 +02:00
Matthias
6fae93f2da
Skip caching unsupported and excluded URLs (#692)
As discussed in https://github.com/lycheeverse/lychee/issues/647#issuecomment-1170773449, it does not make much sense to cache unsupported
and excluded URLs.
Unsupported URLs might be supported in the future and caching them
would mean they won't get checked then. Excluded URLs were
excluded for a reason and should not appear in the cache.
Furthermore they might not be excluded
in a consecutive run, leading to a false-positive.
2022-07-17 18:40:45 +02:00
Walter Beller-Morales
9ad53f97a2
Fix deserialize of lycheecache status codes (#685)
* Add custom deserializer for `CacheStatus` to properly classify status codes
* Add CLI integration tests to check .lycheecache behavior
* Add comment to explain conflict between cache and accept flags
2022-07-15 22:45:24 +02:00
Matthias
fb367ef43a
Add http://www.w3.org/1999/xlink to list of false positives (#664) 2022-07-01 12:11:57 +02:00
Matthias
487d88cefe
Add test for mailto address with query params (#655) 2022-06-29 10:19:17 +02:00
dependabot[bot]
231939af82
Bump html5gum from 0.4.0 to 0.5.1 (#658)
* Bump html5gum from 0.4.0 to 0.5.1

Bumps [html5gum](https://github.com/untitaker/html5gum) from 0.4.0 to 0.5.1.
- [Release notes](https://github.com/untitaker/html5gum/releases)
- [Commits](https://github.com/untitaker/html5gum/compare/0.4.0...0.5.1)

---
updated-dependencies:
- dependency-name: html5gum
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update html5gum

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-06-23 00:07:28 +02:00
Markus Unterwaditzer
f1ae22da09
Replace lazy hashset with matches! (#656)
* Replace lazy hashset with matches!

llvm will typically create much faster code than accessing a hashset at
runtime

source: trust me bro

* cargo fix

* cargo fmt

* shorten docstring
2022-06-18 19:00:07 +02:00
Matthias
84de43c554
Refactor request types (#637) 2022-06-03 20:13:07 +02:00