Commit graph

156 commits

Author SHA1 Message Date
Trask Stalnaker
6d0e94c799
Introduce --root-dir (#1576)
* windows

* Introduce --root-path

* lint

* lint

* Simplification

* Add unit tests

* Add integration test

* Sync docs

* Add missing comment to make CI happy

* Revert one of the Windows-specific changes because causing a test failure

* Support both options at the same time

* Revert a comment change that is no longer applicable

* Remove unused code

* Fix and simplification

* Integration test both at the same time

* Unit tests both at the same time

* Remove now redundant comment

* Revert windows-specific change, seems not needed after recent changes

* Use Collector::default()

* extract method and unit tests

* clippy

* clippy: &Option<A> -> Option<&A>

* Remove outdated comment

* Rename --root-path to --root-dir

* Restrict --root-dir to absolute paths for now

* Move root dir check
2024-12-13 14:36:33 +01:00
Trask Stalnaker
c9d5d0de6d
Pass along --max-retries config option (#1572) 2024-11-26 13:43:03 +01:00
Matthias Endler
9dc42176fa
Rename fail_map to error_map for improved clarity in response statistics (#1560)
Fixes #1446
2024-11-08 09:02:33 +01:00
Matthias Endler
e794b40d4d
Support excluded paths in --dump-inputs (#1556) 2024-11-07 16:32:32 +01:00
Matthias Endler
71564344de
Fix: Bring back error output for links (#1553)
With the last lychee release, we simplified the status output for links.

While this reduced the visual noise, it also accidentally caused the source of errors to not be printed anymore. This change brings back the additional error information as part of the final report output. Furthermore, it shows the error information in the progress output if verbose mode is activated.

Fixes #1487
2024-11-07 00:22:50 +01:00
Matthias Endler
812941c2aa
Fix format option in configuration file (#1547) 2024-10-27 02:17:00 +02:00
Matthias Endler
3094bbca33
Add support for relative links (#1489)
This commit introduces several improvements to the file checking process and URI handling:

- Extract file checking logic into separate `Checker` structs (`FileChecker`, `WebsiteChecker`, `MailChecker`)
- Improve handling of relative and absolute file paths
- Enhance URI parsing and creation from file paths
- Refactor `create_request` function for better clarity and error handling

These changes provide better support for resolving relative links, handling different base URLs, and working with file paths.

Fixes https://github.com/lycheeverse/lychee/issues/1296 and https://github.com/lycheeverse/lychee/issues/1480
2024-10-26 04:07:37 +02:00
Damien Mathieu
f0ebac29a2
Allow excluding cache based on status code (#1403)
This introduces an option `--cache-exclude-status`, which allows specifying a range of HTTP status codes which will be ignored from the cache.

Closes #1400.
2024-10-14 02:41:56 +02:00
Thomas Zahner
17f62aef53
Respect timeout when retrieving archived link (#1526) 2024-10-12 21:49:50 +02:00
Matthias Endler
e2814acaa4
fix: Remove tokio console subscriber (#1524)
The console subscriber is the source of quite a few papercuts like
https://github.com/lycheeverse/lychee/issues/1513.

Since we don't use it at the moment, I decided to remove it.
2024-10-12 02:19:31 +02:00
Matthias
060e0cd55c Disable Wayback machine tests
See https://www.forbes.com/sites/daveywinder/2024/10/10/internet-hacked-wayback-machine-down-31-million-passwords-stolen/
2024-10-12 02:02:50 +02:00
Sebastiaan Speck
d8253a11f5
markdown.rs: make first line a top-level heading (#1511)
Fixes [`MD041` - First line in a file should be a top-level heading](https://github.com/DavidAnson/markdownlint/blob/v0.35.0/doc/md041.md)
2024-10-06 17:41:27 +02:00
Thomas Zahner
6075b4c87e Skip ignored and hidden files by default 2024-09-22 19:09:35 +02:00
Thomas Zahner
6444e27a84 Make gitignored files configurable and disable by default 2024-09-22 19:09:35 +02:00
Thomas Zahner
7fcf66c492
Extend compact format (#1497)
* Show unknowns and timeouts in compact format
* Clippy: make functions const
2024-09-09 18:33:18 +02:00
Matthias Endler
24d84e0045
Properly wire in tokio-console (#1482) 2024-08-07 23:09:47 +02:00
Hugo McNally
4bb8a61545
Updated pulldown-cmark dependency and fixed maths parsing (#1473)
* Update pulldown-cmark version to 0.11.0
* Fix markdown math parsing
* Fix lints
* Disable flaky wayback test

---------

Co-authored-by: Matthias <matthias@endler.dev>
2024-08-06 15:43:34 +02:00
Nabeen Tiwaree
141b5379c9
feat(clap): show help menu on no args as well (#1458) 2024-06-25 11:03:56 +02:00
Matthias Endler
dedc554eda
Add response formatter; refactor stats formatter (#1398)
This adds support for formatting responses in different ways.

For now, the options are:

* `plain`: No color, basic formatting
* `color`: Color, indented formatting (default)
* `emoji`: Fancy mode with emoji icons

Fixes #546
Related to #271
2024-06-14 19:47:52 +02:00
Johannes Schindelin
8c6eee9b5f
Add a way to handle "pretty URLs", i.e. URIs without .html extension (#1422)
In many circumstances (GitHub Pages, Apache configured with MultiViews,
etc), web servers process URIs by appending the `.html` file extension
when no file is found at the path specified by the URI but a `.html`
file corresponding to that path _is_ found.

To allow Lychee to use the fast, offline method of checking such files
locally via the `file://` scheme, let's handle this scenario gracefully
by adding the `--fallback-extensions=html` option.

Note: This new option can take a list of file extensions to use; The
first one for which a corresponding file is found is then used.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-06-11 16:11:24 +02:00
Johannes Schindelin
975901d470
Fix clippy errors (#1423)
* Enclose Markdown links in brackets

The current clippy version (v0.1.78) says "you should put bare URLs
between `<`/`>` or make a proper Markdown link" and refers to
https://rust-lang.github.io/rust-clippy/master/index.html#doc_markdown

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

* Enclose documentation item in backticks

Clippy v0.1.78 complains about the IPv6 network mask, insisting that it
is missing backticks. So backticks it gets.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

* Avoid error claiming `Add(usize)` is dead code

Clippy v0.1.78 identifies this as dead code. However, further down in
the same file, there is clearly a user:

  impl Handler<Result, Result> for Add {

This might be yet another incarnation of
https://github.com/rust-lang/rust/issues/56750

Let's just mark it as intentionally dead-code, even if this is untrue,
to make clippy happy again.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

---------

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-05-13 19:07:02 +02:00
John Bampton
7be088bbfc
Fix spelling; Github -> GitHub (#1416) 2024-04-25 22:44:24 +02:00
Matthias Endler
fc85695d21
Gracefully handle invalid URIs (#1414)
With the upgrade to `reqwest` 0.12, we can finally handle a long-standing
issue, when Urls could not be parsed to Uris. Previously, we would panic, but
we can now handle that situation gracefully and return an error instead.

I've also renamed `Status::is_failure` to `Status::is_error`, because the
notion of failures no longer exists in the codebase and we use the term "error"
consistently throughout the codebase instead. This is technically a breaking
change in the API, but it's fine since we have not released a stable version
yet.

More information about the URI parsing issue:
- https://github.com/lycheeverse/lychee/issues/539
- https://github.com/seanmonstar/reqwest/issues/668
2024-04-25 15:29:36 +02:00
Thomas Zahner
e0059c4292 Fix typo 2024-04-25 08:48:11 +02:00
Thomas Zahner
730f5310b1 Disable lint with false positive 2024-04-25 08:48:11 +02:00
Thomas Zahner
e0b4c73987 Adapt to breaking changes & revert to pulldown-cmark 0.9 2024-04-25 08:48:11 +02:00
Matthias Endler
ad3ba31184
Merge missing include_mail flag into config (#1357) 2024-01-24 13:39:43 +01:00
Matthias Endler
d481c061b9
Always output valid JSON with --format=json (#1356)
Previously, when using JSON as the output format, any supplementary warnings included in the output would invalidate the JSON structure. This pull request addresses this issue by redirecting any extra warnings to `stderr`. This change guarantees that the output remains valid JSON even when additional warnings are present.

Fixes https://github.com/lycheeverse/lychee/issues/1355
2024-01-24 13:12:55 +01:00
Techassi
0d0be52844
fix: Add accept option to merged config (#1344) 2024-01-09 20:55:39 +01:00
Matthias Endler
d3d0cd513d
Better TOML parsing error message (#1332)
The error handling for config loading was pretty poor.
That's because we didn't use the correct syntax to show the entire context with `anhow`.
See ["Display representations"](https://docs.rs/anyhow/latest/anyhow/struct.Error.html#display-representations).
2024-01-04 22:17:14 +01:00
Thomas Zahner
46f0ae908e
Address warnings of the new clippy lints (#1310) 2023-12-01 14:21:49 +01:00
Techassi
1b1fd0c707
feat: Add support for ranges in the --accept option / config field (#1167)
Adds support for accept ranges discussed in #1157. This allows the user to specify custom HTTP status codes accepted during checking and thus will report as valid (not broken). The accept option only supports specifying status codes as a comma-separated list. With this PR, the option will accept a list of status code ranges formatted like this:

```toml
accept = ["100..=103", "200..=299", "403"]
```

These combinations will be supported: `..<end>`, ` ..=<end>`, `<start>..<end>` and `<start>..=<end>`.
The behavior is copied from the Rust Range like concepts:

```
    ..<end>, includes 0 to <end> (exclusive)
    ..=<end>, includes 0 to <end> (inclusive)
    <start>..<end>, includes <start> to <end> (exclusive)
    <start>..=<end>, includes <start> to <end> (inclusive)
```


- Foundation and enhancements for accept ranges, including support for comma-separated strings and integration into the CLI.
- Implementations and updates for AcceptSelector, including Default, Display, and serde defaults.
- Address and fix various errors: clippy, cargo fmt, and tests.
- Add more tests, address edge cases, and enhance error messaging, especially for TOML config parsing.
- Update dependencies.
2023-09-17 21:39:01 +02:00
Matthias Endler
0711112841
Mention supported schemes (#1255)
Fixes https://github.com/lycheeverse/lycheeverse.github.io/issues/7
2023-09-15 01:27:44 +02:00
dependabot[bot]
fbb77d7f0e
Bump the dependencies group with 10 updates (#1249)
* Bump the dependencies group with 10 updates

Bumps the dependencies group with 10 updates:

| Package | From | To |
| --- | --- | --- |
| [clap](https://github.com/clap-rs/clap) | `4.3.23` | `4.4.2` |
| [dashmap](https://github.com/xacrimon/dashmap) | `5.5.1` | `5.5.3` |
| [openssl-sys](https://github.com/sfackler/rust-openssl) | `0.9.91` | `0.9.92` |
| [regex](https://github.com/rust-lang/regex) | `1.9.3` | `1.9.5` |
| [reqwest](https://github.com/seanmonstar/reqwest) | `0.11.19` | `0.11.20` |
| [octocrab](https://github.com/XAMPPRocky/octocrab) | `0.29.3` | `0.30.1` |
| [thiserror](https://github.com/dtolnay/thiserror) | `1.0.47` | `1.0.48` |
| [typed-builder](https://github.com/idanarye/rust-typed-builder) | `0.15.2` | `0.16.0` |
| [url](https://github.com/servo/rust-url) | `2.4.0` | `2.4.1` |
| [criterion](https://github.com/bheisler/criterion.rs) | ``4c19e91`` | ``180f4b4`` |


Updates `clap` from 4.3.23 to 4.4.2
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.3.23...v4.4.2)

Updates `dashmap` from 5.5.1 to 5.5.3
- [Release notes](https://github.com/xacrimon/dashmap/releases)
- [Commits](https://github.com/xacrimon/dashmap/compare/v5.5.1...v.5.5.3)

Updates `openssl-sys` from 0.9.91 to 0.9.92
- [Release notes](https://github.com/sfackler/rust-openssl/releases)
- [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-sys-v0.9.91...openssl-sys-v0.9.92)

Updates `regex` from 1.9.3 to 1.9.5
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.9.3...1.9.5)

Updates `reqwest` from 0.11.19 to 0.11.20
- [Release notes](https://github.com/seanmonstar/reqwest/releases)
- [Changelog](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/seanmonstar/reqwest/compare/v0.11.19...v0.11.20)

Updates `octocrab` from 0.29.3 to 0.30.1
- [Release notes](https://github.com/XAMPPRocky/octocrab/releases)
- [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md)
- [Commits](https://github.com/XAMPPRocky/octocrab/compare/v0.29.3...v0.30.1)

Updates `thiserror` from 1.0.47 to 1.0.48
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.47...1.0.48)

Updates `typed-builder` from 0.15.2 to 0.16.0
- [Changelog](https://github.com/idanarye/rust-typed-builder/blob/master/CHANGELOG.md)
- [Commits](https://github.com/idanarye/rust-typed-builder/commits)

Updates `url` from 2.4.0 to 2.4.1
- [Release notes](https://github.com/servo/rust-url/releases)
- [Commits](https://github.com/servo/rust-url/compare/v2.4.0...v2.4.1)

Updates `criterion` from `4c19e91` to `180f4b4`
- [Commits](4c19e913b8...180f4b4896)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: dashmap
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: openssl-sys
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: reqwest
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: octocrab
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: typed-builder
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: url
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: criterion
  dependency-type: direct:production
  dependency-group: dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* Format

* Fix new clippy lints

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2023-09-05 10:50:45 +02:00
Matthias Endler
1bf2944c1e
Update dependencies; fix flaky tests (#1219) 2023-08-15 16:41:58 +02:00
dependabot[bot]
8f83081b03
Bump octocrab from 0.28.0 to 0.29.1 (#1193)
* Bump octocrab from 0.28.0 to 0.29.1

Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.28.0 to 0.29.1.
- [Release notes](https://github.com/XAMPPRocky/octocrab/releases)
- [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md)
- [Commits](https://github.com/XAMPPRocky/octocrab/compare/v0.28.0...v0.29.1)

---
updated-dependencies:
- dependency-name: octocrab
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Make wayback suggestion test more robust

- Retry mechanism
- Better checks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2023-08-07 12:58:54 +02:00
Stefan Kreutz
b1b32e7717
Fix rustls-tls feature (#1194)
* Fix rustls-tls feature

Commit 14e74879 (cookie support #1146) re-introduced an unconditional
dependency on the openssl-sys crate. That is, building Lychee with the
Rustls TLS backend now requires OpenSSL. I suppose this change was
unintended, maybe due to automatic conflict resolution. If not, please
let me know.

You can review the re-introduced dependency like so:

```
cargo tree --no-default-features --features rustls-tls -i openssl-sys
```

This commit puts the OpenSSL dependency behind the native-tls feature
flag again.

You can check the TLS features like so:

```
cargo check --workspace --all-targets --features vendored-openssl

cargo check --workspace --all-targets --all-features

cargo check --workspace --all-targets --no-default-features --features rustls-tls
```

Maybe this should be added to CI. But I don't want to waste anybody's
time.

* Check feature flags during CI

Adds a new CI job 'check-feature-flags' to verify the following:

- Lychee with rustls-tls feature only doesn't depend on OpenSSL
- Cargo check passes with default features
- Cargo check passes with all features
- Cargo check passes with rustls-tls feature only
2023-08-04 15:11:29 +02:00
Hugo McNally
8e6369377c
Introduce fragment checking for links to markdown files. (#1126)
- Implemented enhancements to include fragments in file links
- Checked links to markdown files with fragments, generating unique kebab case and heading attributes.
- Made code more idiomatic and added an integration test.
- Updated documentation.
- Fixed issues with heading attributes fragments and ensured proper handling of file errors.
2023-07-31 16:04:00 +02:00
Matthias Endler
04887ee293
Make checking email addresses optional (#1171)
E-Mail checks cause too many false-postives,
so we put them behind a flag.

* `--exclude-mail` is deprecated (to be removed in 1.0)
* `--include-mail` is the new flag

This PR also removes the obsolete tests for `--exclude-file`, which was superseded by `.lycheeignore`.

Fixes #1089
2023-07-19 19:58:38 +02:00
Techassi
f53619a455
feat: Add support for --dump-inputs (#1159)
* Add support for --dump-inputs
* Add integration tests
* Fix usage guide in README
2023-07-16 18:08:14 +02:00
Matthias Endler
14e748793e
Cookie Support (#1146)
This is a very conservative and limited implementation of cookie support.

The goal is to ship an MVP, which covers 80% of the use-cases.
When you run lychee with --cookie-jar cookies.json, all cookies will be stored in cookies.json, one cookie per line.
This makes cookies easy to edit by hand if needed, although this is an advanced use-case and the API for the format is not guaranteed to be stable.

Fixes: #645, #715
Partially fixes: #1108
2023-07-13 17:32:41 +02:00
Matthias Endler
97573123ef
Extend remap feature (#1133)
* wip

* Extend support for remapping

This adds supports for partial remaps and
capture groups to the remap feature.

Fixes #1129
2023-07-05 15:05:19 +02:00
Techassi
67af7ef6d3
feat: add support for basic auth per URI (#1110)
* Add support for basic auth per domain
* Move URI matching to link collection phase
* Allow AsRef for BasicAuthExtractor::new to avoid clone
* Add tests

---------

Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-06-26 12:06:24 +02:00
Stefan Kreutz
7dd84f6b7c
Add optional Rustls support (#1099)
* Add optional Rustls support

This commit adds a non-default feature flag to use Rustls instead of OpenSSL.

My personal motivation is to use Lychee on OpenBSD -current, where the
`openssl` crate frequently fails to link against the unreleased system
LibreSSL. Using the `vendored-openssl` feature helps with compilation, but
segfaults at runtime.

The commit adds three feature flags to the library, binary, benchmark, and all
examples:

- The `native-tls` feature flag toggles the `openssl` crate.
- The `rustls-tls` feature flag toggles the `rustls` crate.
- The `email-check` feature flag toggles the `check-if-email-exists` crate,
  which is the only existing functionality currently incompatible with Rustls.

By default, `native-tls` and `email-check` are enabled. Thus, Lychee (bin and
lib) can be used as before unless default features are disabled.

To use the Rustls feature, pass `--no-default-features --features rustls` to
cargo check/build/test/..., e.g.,

    $ cargo clippy --workspace --all-targets --no-default-features \ --features
    rustls-tls -- --deny warnings

Checking email addresses requires both, `native-tls` and `email-check`, to be
enabled. Otherwise, email addresses are excluded.

The `email-check` feature flag is technically not necessary. I preferred it
over `not(rustls-tls)` because it's clearer and it addresses the AGPL license
issue #594. As far as I understand, a Lychee binary compiled without the
`email-check` feature could be distributed with file-based copyleft for the
MPL-licensed dependencies only. But that's out of scope here.

The benchmark shows a performance regression varying between 2% and 4.4% when
using Rustls instead of OpenSSL on my machine.

PS: The `ring` crate needs to be patched on OpenBSD 7.3 and later until the new
xonly patches have been upstreamed, see the `rust-ring` port.

* Use platform native certificates with Rustls

By default, reqwest uses the webpki-roots crate with Rustls, effectively
bundling Mozilla's root certificates.

This commit uses the rustls-native-certs crate instead to use locally
installed root certificates, to minimize the difference between the
native-tls and rustls-tls features.

* Document feature flags
2023-06-16 02:21:57 +02:00
Matthias Endler
5ce77e1202
Don't cache unknown status codes (#1090)
Unknown status codes should be skipped and not cached by default. The reason is that we don't know if they are valid or not and even if they are invalid, we don't know if they will be valid in the future.
2023-06-02 02:46:20 +02:00
Matthias
649ab227d3 Add check duration to compact format 2023-06-01 18:31:41 +02:00
Matthias Endler
3c3051a7f0
Remove inaccurate details in compact view (#1088) 2023-06-01 16:55:30 +02:00
Matthias Endler
2b08c250be
Prettier colors and progress bar (#1069)
I've experimented a bit with the colors and these are the ones I
(currently) like best. The loader is taken from Python.
See https://stackoverflow.com/a/73724672
and 68224905f5/rich/progress_bar.py (LL70C16-L70C16)
2023-05-17 14:35:26 +02:00
Thomas Zahner
130fa21a6a
Concurrent archives (#1027) 2023-05-11 20:20:27 +02:00
Matthias Endler
fe24ba783a
Add check duration (in seconds) to report (#1064) 2023-05-06 00:47:32 +02:00