lychee

mirror of https://github.com/Hopiu/lychee.git synced 2026-03-29 19:00:25 +00:00

Author	SHA1	Message	Date
Techassi	f53619a455	feat: Add support for --dump-inputs (#1159 ) * Add support for --dump-inputs * Add integration tests * Fix usage guide in README	2023-07-16 18:08:14 +02:00
Matthias	961575cdc7	fix typos	2023-07-13 21:48:46 +02:00
Matthias Endler	14e748793e	Cookie Support (#1146 ) This is a very conservative and limited implementation of cookie support. The goal is to ship an MVP, which covers 80% of the use-cases. When you run lychee with --cookie-jar cookies.json, all cookies will be stored in cookies.json, one cookie per line. This makes cookies easy to edit by hand if needed, although this is an advanced use-case and the API for the format is not guaranteed to be stable. Fixes: #645, #715 Partially fixes: #1108	2023-07-13 17:32:41 +02:00
Matthias Endler	40ba18794d	Don't check Twitter URLs (#1147 ) Twitter completely locked down and requires a login to read tweets. (Temporarily) disable all Twitter URLs to avoid false-positives. For context: https://github.com/zedeus/nitter/issues/919 https://news.ycombinator.com/item?id=36540957 https://techcrunch.com/2023/06/30/twitter-now-requires-an-account-to-view-tweets/ Fixes https://github.com/lycheeverse/lychee/issues/1108	2023-07-13 17:31:59 +02:00
Matthias Endler	97573123ef	Extend remap feature (#1133 ) * wip * Extend support for remapping This adds supports for partial remaps and capture groups to the remap feature. Fixes #1129	2023-07-05 15:05:19 +02:00
Matthias Endler	15e420b8ad	Avoid false positives when checking email addresses in HTML input (#1123 ) Skip email addresses outside href attributes in HTML	2023-07-01 00:12:11 +02:00
Techassi	67af7ef6d3	feat: add support for basic auth per URI (#1110 ) * Add support for basic auth per domain * Move URI matching to link collection phase * Allow AsRef for BasicAuthExtractor::new to avoid clone * Add tests --------- Co-authored-by: Matthias Endler <matthias@endler.dev>	2023-06-26 12:06:24 +02:00
Matthias Endler	58c07e495e	Update false-positive patterns (#1120 )	2023-06-25 15:09:09 +02:00
Matthias Endler	f0af985aac	Log redirects in verbose mode (-vv) (#1117 ) This adds a custom redirect policy, which logs redirects as debug messages. It can help with troubleshooting, e.g. in situations like https://github.com/lycheeverse/lychee/issues/1115	2023-06-23 15:49:05 +02:00
Stefan Kreutz	7dd84f6b7c	Add optional Rustls support (#1099 ) * Add optional Rustls support This commit adds a non-default feature flag to use Rustls instead of OpenSSL. My personal motivation is to use Lychee on OpenBSD -current, where the `openssl` crate frequently fails to link against the unreleased system LibreSSL. Using the `vendored-openssl` feature helps with compilation, but segfaults at runtime. The commit adds three feature flags to the library, binary, benchmark, and all examples: - The `native-tls` feature flag toggles the `openssl` crate. - The `rustls-tls` feature flag toggles the `rustls` crate. - The `email-check` feature flag toggles the `check-if-email-exists` crate, which is the only existing functionality currently incompatible with Rustls. By default, `native-tls` and `email-check` are enabled. Thus, Lychee (bin and lib) can be used as before unless default features are disabled. To use the Rustls feature, pass `--no-default-features --features rustls` to cargo check/build/test/..., e.g., $ cargo clippy --workspace --all-targets --no-default-features \ --features rustls-tls -- --deny warnings Checking email addresses requires both, `native-tls` and `email-check`, to be enabled. Otherwise, email addresses are excluded. The `email-check` feature flag is technically not necessary. I preferred it over `not(rustls-tls)` because it's clearer and it addresses the AGPL license issue #594. As far as I understand, a Lychee binary compiled without the `email-check` feature could be distributed with file-based copyleft for the MPL-licensed dependencies only. But that's out of scope here. The benchmark shows a performance regression varying between 2% and 4.4% when using Rustls instead of OpenSSL on my machine. PS: The `ring` crate needs to be patched on OpenBSD 7.3 and later until the new xonly patches have been upstreamed, see the `rust-ring` port. * Use platform native certificates with Rustls By default, reqwest uses the webpki-roots crate with Rustls, effectively bundling Mozilla's root certificates. This commit uses the rustls-native-certs crate instead to use locally installed root certificates, to minimize the difference between the native-tls and rustls-tls features. * Document feature flags	2023-06-16 02:21:57 +02:00
Matthias Endler	5ce77e1202	Don't cache unknown status codes (#1090 ) Unknown status codes should be skipped and not cached by default. The reason is that we don't know if they are valid or not and even if they are invalid, we don't know if they will be valid in the future.	2023-06-02 02:46:20 +02:00
Levi Zim	9b0a06e1a9	test(client): make exponential_backoff better (#1079 ) This test is still flaky on riscv64 boards after #1049. It turns out that building the client might take 59ms, which should not be counted.	2023-05-26 13:32:28 +02:00
Matthias Endler	fe24ba783a	Add check duration (in seconds) to report (#1064 )	2023-05-06 00:47:32 +02:00
Levi Zim	436a235f4b	perform a warm up request in test_exponential_backoff (#1049 ) Perform a warm-up request to ensure the lazy regexes in `lychee-lib/src/quirks/mod.rs` are compiled. On some platforms, this can take some time(approx. 110ms), which should not be counted in the test.	2023-04-21 22:27:38 +02:00
dependabot[bot]	6a72f81535	Bump octocrab from 0.19.0 to 0.20.0 (#1045 ) * Bump octocrab from 0.19.0 to 0.20.0 Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.19.0 to 0.20.0. - [Release notes](https://github.com/XAMPPRocky/octocrab/releases) - [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md) - [Commits](https://github.com/XAMPPRocky/octocrab/compare/octocrab@0.19.0...v0.20.0) --- updated-dependencies: - dependency-name: octocrab dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * impl RetryExt for http::Error --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthias <matthias-endler@gmx.net>	2023-04-17 23:14:24 +02:00
Matthias	e47577708b	formatting	2023-04-11 16:36:32 +02:00
Matthias	0c04f0371d	Improve `is_url` function and tests	2023-04-11 01:00:25 +02:00
Matthias	e96d4114a9	helpers -> utils	2023-04-11 00:43:57 +02:00
Matthias	102480502c	Lazy-load quirk patterns	2023-04-11 00:41:12 +02:00
Matthias	bbc0bfab9c	format errors	2023-04-11 00:00:14 +02:00
Benny Joe Villiger	250f7a8f0a	Status codes in maps (#1014 )	2023-03-27 12:29:12 +02:00
Matthias Endler	55797071b0	Fix nested URL extraction in verbatim elements (#988 ) Skipping URLs in verbatim elements didn't take nested elements into consideration, which were not verbatim. For instance, the following HTML snippet would yield `https://example.com` in non-verbatim mode, even if it is nested inside a verbatim `<pre>` element: ```html <pre><a href="https://example.com">link</a></pre> ``` This commit fixes the behavior for both `html5gum` and `html5ever`. Note that nested verbatim elements of the same kind still are not handled correctly. For instance, the following HTML snippet would still yield `https://example.com`: ```html <pre> <pre></pre> <a href="https://example.com">link</a> </pre> ``` The reason is that we currently only keep track of a single verbatim element and not a stack of elements, which we would need to unwind and resolve the situation. Fixes https://github.com/lycheeverse/lychee/issues/986.	2023-03-11 15:18:25 +01:00
Matthias Endler	2255ad9286	Better retry handling (#981 ) Previously, lychee would blindly retry all requests, no matter if the request error was transient or fatal. Taking a lesson from https://github.com/TrueLayer/reqwest-middleware, we can be more granular about the error behavior. This PR adds their retry logic to lychee, reducing the number of unnecessary requests significantly. I also made some ergonomic changes to the client, which should not affect its behavior.	2023-03-10 22:36:45 +01:00
Matthias Endler	30e2a2b62b	Fix `--max-redirects` (#987 ) Having more than the max number of redirects caused lychee to abort the requests, but did not lead to an error. Related: https://github.com/lycheeverse/lychee-action/issues/164	2023-03-10 15:15:37 +01:00
Matthias	59ddc1e27d	Fix url input handling without scheme	2023-03-03 12:13:09 +01:00
Matthias Endler	7874195bbb	Customize verbosity (#956 )	2023-02-24 23:53:09 +01:00
Matthias Endler	b653a0a1ec	Fix cached 200 status code handling (#958 ) * Fix cached 200 status code handling Assert that code 200 never needs to be explicitly accepted for cached response to match the behavior of uncached checks * Bump version to v0.11.1	2023-02-23 00:25:53 +01:00
Matthias	5558531bab	Fix lint	2023-02-22 21:05:49 +01:00
Kian-Meng Ang	9fa1d732f7	Fix typos (#944 ) Found via `codespell -S fixtures -L crate,reacher,t`	2023-02-09 15:32:16 +01:00
dependabot[bot]	0a2cd324d5	Bump typed-builder from 0.11.0 to 0.12.0 (#934 ) * Bump typed-builder from 0.11.0 to 0.12.0 Bumps [typed-builder](https://github.com/idanarye/rust-typed-builder) from 0.11.0 to 0.12.0. - [Release notes](https://github.com/idanarye/rust-typed-builder/releases) - [Changelog](https://github.com/idanarye/rust-typed-builder/blob/master/CHANGELOG.md) - [Commits](https://github.com/idanarye/rust-typed-builder/commits) --- updated-dependencies: - dependency-name: typed-builder dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Remove custom builder method docs. We use the default again, which offers the same amount of information. * Add `make` target to show docs --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthias <matthias-endler@gmx.net> Co-authored-by: Matthias Endler <matthias@endler.dev>	2023-01-30 15:12:20 +01:00
Matthias Endler	9837699b79	Introduce new let...else syntax (#936 )	2023-01-30 14:25:30 +01:00
Lucius Hu	e2406089ad	chore!: improve client and remap modules (#913 ) `lychee_lib::client`: - Improved documentation. - Added an log message in `ClientBuilder::client()` when provied user-agent overrides the one defined in provied custom header. - Removed unnecessary error handling in `Client::check()` when setting HTTPS scheme because all failure cases should occur when checking this URL the first time already. - Removed unnecessary error handling in `Client::remap()` since `lychee-lib::remap::Remaps::remap()` doesn't returns a `Result` anymore. - Fixed potential integer overflow in `Client::check_website()` when the wait time between retries doubles, by using `std::time::Duration::saturating_mul` instead. - Renamed `invalid()` to `validate_url()`. `lychee_lib::remap`: - Improved documentation, in particular, clarified (in the comment) that it's URLs not URIs being remapped. - Changed `Remaps::remap()` so it takes `&mut Url` instead of `Uri` as its argument, and doesn't return a `Result` as a result. - Using `Url` instead of `Uri` because it aligns with the concept of remapping locations rather than identifiers. - Mutating the URL directly instead of returning a new one for it's more straightforward. - There is no error handling because we don't convert from URL to URI anymore. Furthermore, this always succeed in the first place so we never needed error handling. - Added implementation of `IntoIterator` for `&'a Remaps` and convenience method of `Remaps::iter`. (Their mutable or moving counterparts are deliberately avoided because we don't want library users to modify all consume the remapping rules after its instantiation.) `lychee_lib::error`: - Renamed `ErrorKind::InvalidUriRemap` to `InvalidUrlRemap` and improved its error message. Changes to other modules are minor and only serves to accompany aforementioned changes.	2023-01-16 19:14:09 +01:00
Matthias Endler	b620fc99f7	Properly handle youtu.be shortlinks (#908 ) Previously those were not correctly rewritten to thumbnail URLs. This should be fixed now by splitting up the logic for normal YouTube links and shortlinks. Fixes #906	2023-01-06 18:25:09 +01:00
Matthias Endler	4a3bfb99fb	Remove address from verbatim elements (#901 )	2023-01-05 14:55:53 +01:00
Matthias Endler	5654b7c317	Harden URL detection and extend verbatim elements (#899 ) Previously remote URLs were incorrectly detected because the string representation of a path is different than the path itself, causing the `http` prefix match to be insufficient. This resulted in unexpected side-effects, such as the incorrect detection of verbatim mode for remote URLs. The check now got improved and unit tests were added to avoid future breakage. On top of that, missing verbatim elements were added	2023-01-04 00:38:19 +01:00
Matthias Endler	da46734c54	Extend response stats in verbose mode (#882 )	2022-12-20 10:43:01 +01:00
Matthias Endler	6df1c378ec	Fix Rust 1.66 clippy lints (#879 )	2022-12-19 14:28:10 +01:00
Matthias	7d435f2155	Add more markdown extensions (#866 )	2022-12-12 18:26:42 +01:00
Matthias	ef391cea50	Recursively skip verbatim elements (#847 )	2022-12-12 01:06:45 +01:00
Matthias	9eeea250cd	Exclude <script> tags by default (#848 ) This is a naive approach to exclude script tags from getting checked. The reason is that the tag leads to a lot of false-positives (e.g. `//unpkg.com/docsify-edit-on-github@1` within a script block gets detected as an e-mail address). A more thorough approach would be the use of a tree-builder in html5gum and html5ever, but this could have a negative performance impact. I also did not want to add a new flag (e.g. `--include-scripts`) for this setting because the current set of flags around exclusion/inclusion is already quite long. Fixes #821.	2022-11-29 00:38:43 +01:00
Matthias	982d978e47	Add different verbosity levels (#824 ) More granular verbosity levels have been asked for repeatedly. To enable that we're moving to [env_logger] and [clap-verbosity-flag] to provide more flexible verbosity settings. Also tackles #661, #709 Lays the groundwork for tackling #268 https://github.com/rust-cli/env_logger https://github.com/clap-rs/clap-verbosity-flag	2022-11-28 23:25:33 +01:00
Matthias	b479a5810e	Allow overriding accepted status codes for cached URIs (#843 ) Fixes #840	2022-11-28 12:23:07 +01:00
Matthias	765f7adb12	Don't check example mail addresses by default (#815 ) This was an oversight so far that became apparent after our recent fix for email addreses with query params (e.g. `test@example.com?subject=test`). The parsing of email addresses has improved and so we detect more mail addresses, but we didn't check if they belonged to an example domain, causing false-positive checks.	2022-11-08 23:46:32 +01:00
Matthias	d61105edbb	Fix parsing error of email addresses with query params (#809 ) Email addresses with query parameters often get used in contact forms on websites. They can also be found in other documents like Markdown. A common use-case is to add a subject line to the email as a parameter e.g. `mailto:mail@example.com?subject="Hello"`. Previously we handled such cases incorrectly by recognizing them as files. The reason was that our email parsing was too strict to allow for that use-case. With `email_address` we switched to a more permissive parser. Note that this does not affect the actual address email checking, as this is still done `check-if-email-exists`, which has more strict check functionality.	2022-11-05 23:40:33 +01:00
Matthias	94dda21326	Fix clippy lints	2022-09-27 18:17:37 +02:00
dependabot[bot]	226546091b	Bump check-if-email-exists from 0.8.31 to 0.9.0 (#735 ) * Bump check-if-email-exists from 0.8.31 to 0.9.0 Bumps [check-if-email-exists](https://github.com/reacherhq/check-if-email-exists) from 0.8.31 to 0.9.0. - [Release notes](https://github.com/reacherhq/check-if-email-exists/releases) - [Changelog](https://github.com/reacherhq/check-if-email-exists/blob/master/CHANGELOG.md) - [Commits](https://github.com/reacherhq/check-if-email-exists/compare/v0.8.31...v0.9.0) --- updated-dependencies: - dependency-name: check-if-email-exists dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Update usage Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthias <matthias-endler@gmx.net>	2022-08-16 12:35:34 +02:00
Matthias	6a49cedc16	Check Twitter URLs using nitter.net (#731 ) Use an alternative Twitter frontend, which works more reliably than using Twitter directly.	2022-08-12 22:46:35 +02:00
Matthias	69f387c1bd	Markdown-status (#729 ) * Fix typos * Add status code description to markdown output	2022-08-11 22:08:05 +02:00
Walter Beller-Morales	6d40a2ab7b	Update to gracefully handle nonexistent relative paths (#691 ) * Update Input::new to gracefully handle nonexistent relative paths * Add test checking Input::new can handle real relative paths * Add better pre-conditions to Input::new tests * Add integration tests for handling relative paths in lychee-bin * Update lychee-lib/src/types/input.rs	2022-07-22 17:15:55 +02:00
Matthias	6fae93f2da	Skip caching unsupported and excluded URLs (#692 ) As discussed in https://github.com/lycheeverse/lychee/issues/647#issuecomment-1170773449, it does not make much sense to cache unsupported and excluded URLs. Unsupported URLs might be supported in the future and caching them would mean they won't get checked then. Excluded URLs were excluded for a reason and should not appear in the cache. Furthermore they might not be excluded in a consecutive run, leading to a false-positive.	2022-07-17 18:40:45 +02:00

1 2 3 4

171 commits