Commit graph

62 commits

Author SHA1 Message Date
Matthias Endler
dedc554eda
Add response formatter; refactor stats formatter (#1398)
This adds support for formatting responses in different ways.

For now, the options are:

* `plain`: No color, basic formatting
* `color`: Color, indented formatting (default)
* `emoji`: Fancy mode with emoji icons

Fixes #546
Related to #271
2024-06-14 19:47:52 +02:00
Johannes Schindelin
8c6eee9b5f
Add a way to handle "pretty URLs", i.e. URIs without .html extension (#1422)
In many circumstances (GitHub Pages, Apache configured with MultiViews,
etc), web servers process URIs by appending the `.html` file extension
when no file is found at the path specified by the URI but a `.html`
file corresponding to that path _is_ found.

To allow Lychee to use the fast, offline method of checking such files
locally via the `file://` scheme, let's handle this scenario gracefully
by adding the `--fallback-extensions=html` option.

Note: This new option can take a list of file extensions to use; The
first one for which a corresponding file is found is then used.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-06-11 16:11:24 +02:00
Johannes Schindelin
975901d470
Fix clippy errors (#1423)
* Enclose Markdown links in brackets

The current clippy version (v0.1.78) says "you should put bare URLs
between `<`/`>` or make a proper Markdown link" and refers to
https://rust-lang.github.io/rust-clippy/master/index.html#doc_markdown

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

* Enclose documentation item in backticks

Clippy v0.1.78 complains about the IPv6 network mask, insisting that it
is missing backticks. So backticks it gets.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

* Avoid error claiming `Add(usize)` is dead code

Clippy v0.1.78 identifies this as dead code. However, further down in
the same file, there is clearly a user:

  impl Handler<Result, Result> for Add {

This might be yet another incarnation of
https://github.com/rust-lang/rust/issues/56750

Let's just mark it as intentionally dead-code, even if this is untrue,
to make clippy happy again.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

---------

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-05-13 19:07:02 +02:00
Matthias Endler
ad3ba31184
Merge missing include_mail flag into config (#1357) 2024-01-24 13:39:43 +01:00
Techassi
0d0be52844
fix: Add accept option to merged config (#1344) 2024-01-09 20:55:39 +01:00
Matthias Endler
d3d0cd513d
Better TOML parsing error message (#1332)
The error handling for config loading was pretty poor.
That's because we didn't use the correct syntax to show the entire context with `anhow`.
See ["Display representations"](https://docs.rs/anyhow/latest/anyhow/struct.Error.html#display-representations).
2024-01-04 22:17:14 +01:00
Techassi
1b1fd0c707
feat: Add support for ranges in the --accept option / config field (#1167)
Adds support for accept ranges discussed in #1157. This allows the user to specify custom HTTP status codes accepted during checking and thus will report as valid (not broken). The accept option only supports specifying status codes as a comma-separated list. With this PR, the option will accept a list of status code ranges formatted like this:

```toml
accept = ["100..=103", "200..=299", "403"]
```

These combinations will be supported: `..<end>`, ` ..=<end>`, `<start>..<end>` and `<start>..=<end>`.
The behavior is copied from the Rust Range like concepts:

```
    ..<end>, includes 0 to <end> (exclusive)
    ..=<end>, includes 0 to <end> (inclusive)
    <start>..<end>, includes <start> to <end> (exclusive)
    <start>..=<end>, includes <start> to <end> (inclusive)
```


- Foundation and enhancements for accept ranges, including support for comma-separated strings and integration into the CLI.
- Implementations and updates for AcceptSelector, including Default, Display, and serde defaults.
- Address and fix various errors: clippy, cargo fmt, and tests.
- Add more tests, address edge cases, and enhance error messaging, especially for TOML config parsing.
- Update dependencies.
2023-09-17 21:39:01 +02:00
Matthias Endler
0711112841
Mention supported schemes (#1255)
Fixes https://github.com/lycheeverse/lycheeverse.github.io/issues/7
2023-09-15 01:27:44 +02:00
Hugo McNally
8e6369377c
Introduce fragment checking for links to markdown files. (#1126)
- Implemented enhancements to include fragments in file links
- Checked links to markdown files with fragments, generating unique kebab case and heading attributes.
- Made code more idiomatic and added an integration test.
- Updated documentation.
- Fixed issues with heading attributes fragments and ensured proper handling of file errors.
2023-07-31 16:04:00 +02:00
Matthias Endler
04887ee293
Make checking email addresses optional (#1171)
E-Mail checks cause too many false-postives,
so we put them behind a flag.

* `--exclude-mail` is deprecated (to be removed in 1.0)
* `--include-mail` is the new flag

This PR also removes the obsolete tests for `--exclude-file`, which was superseded by `.lycheeignore`.

Fixes #1089
2023-07-19 19:58:38 +02:00
Techassi
f53619a455
feat: Add support for --dump-inputs (#1159)
* Add support for --dump-inputs
* Add integration tests
* Fix usage guide in README
2023-07-16 18:08:14 +02:00
Matthias Endler
14e748793e
Cookie Support (#1146)
This is a very conservative and limited implementation of cookie support.

The goal is to ship an MVP, which covers 80% of the use-cases.
When you run lychee with --cookie-jar cookies.json, all cookies will be stored in cookies.json, one cookie per line.
This makes cookies easy to edit by hand if needed, although this is an advanced use-case and the API for the format is not guaranteed to be stable.

Fixes: #645, #715
Partially fixes: #1108
2023-07-13 17:32:41 +02:00
Techassi
67af7ef6d3
feat: add support for basic auth per URI (#1110)
* Add support for basic auth per domain
* Move URI matching to link collection phase
* Allow AsRef for BasicAuthExtractor::new to avoid clone
* Add tests

---------

Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-06-26 12:06:24 +02:00
Thomas
994b2852cd
Wayback integration (#1003)
Adds support for suggesting archived URLs for broken links.
Uses Wayback Machine as the archive provider.
2023-03-28 00:45:06 +02:00
Matthias Endler
30e2a2b62b
Fix --max-redirects (#987)
Having more than the max number of redirects
caused lychee to abort the requests, but did not
lead to an error.

Related: https://github.com/lycheeverse/lychee-action/issues/164
2023-03-10 15:15:37 +01:00
Matthias
9eb3149a69 Custom config handling to spot errors when passing invalid config and ignoring errors loading missing default conf 2023-03-03 12:13:09 +01:00
Matthias
6c133493e9 Revert "Don't ignore file-not-found errors when loading config"
This reverts commit 9ade4502a27cb3776c5fb39cdad7666ab854a373.
2023-03-03 12:13:09 +01:00
Matthias
387766322d Don't ignore file-not-found errors when loading config
This is no longer necessary ever since 712bdfa8cb
2023-03-03 12:13:09 +01:00
Matthias Endler
7874195bbb
Customize verbosity (#956) 2023-02-24 23:53:09 +01:00
dependabot[bot]
d8e4940dbe
Bump toml from 0.5.11 to 0.7.0 (#933)
* Bump toml from 0.5.11 to 0.7.0

Bumps [toml](https://github.com/toml-rs/toml) from 0.5.11 to 0.7.0.
- [Release notes](https://github.com/toml-rs/toml/releases)
- [Commits](https://github.com/toml-rs/toml/compare/toml-v0.5.11...toml-v0.7.0)

---
updated-dependencies:
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Introduce new let...else syntax

* Update config file loading for latest toml crate version

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-01-30 15:12:34 +01:00
Matthias
e476965bee
Fix verbosity serialization (#853)
Forgot the serde defaults which lead to problems on some terminals
2022-11-29 12:59:32 +01:00
Matthias
982d978e47
Add different verbosity levels (#824)
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
2022-11-28 23:25:33 +01:00
dependabot[bot]
2ce1a9ae06
Bump clap from 3.2.23 to 4.0.22 (#813)
* Bump clap from 3.2.23 to 4.0.22

Bumps [clap](https://github.com/clap-rs/clap) from 3.2.23 to 4.0.22.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.23...v4.0.22)

* The `headers` option got renamed to `header` to align with the rest
   of the options, which are singular.
* The short option for `header` (`-h`) was removed to avoid a conflict with
  help (`lychee -h`).
* Update and simplify readme check

Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-11-13 21:10:32 +01:00
Matthias
264af23822 Improve wording 2022-11-05 17:25:44 +01:00
Andy Grunwald
a67b513238
Extend description of "--exclude" to also exclude email addresses, not only URLs (#801) 2022-10-23 12:17:20 +02:00
Matthias
cbd936960a
Move from structopt to clap (#732)
Structopt was subsumed by clap. See
https://github.com/clap-rs/clap/blob/master/CHANGELOG.md#migrating
2022-08-12 22:53:13 +02:00
Matthias
a557cba0b4
Add support for parsing list of status codes from config file (#636) 2022-06-02 18:53:04 +02:00
Matthias
9b4dfadffd
Fix parsing errors with config options (#632) 2022-05-31 19:43:46 +02:00
vpereira01
d48a3279a8
Improve configuration example (#631)
* Add missing parameters
* Remove deprecated `--exclude-file` parameter
* Improve TOML comments
* Add config smoketest
2022-05-31 19:05:27 +02:00
Matthias
22fecfc056
Add support for URI remapping (#620)
Remaps allow mapping from a URI pattern to a different URI.

The syntax is

```
lychee --remap 'https://example.com http://127.0.0.1'
```

Some use-cases are
- Testing URIs prior to production deployment
- Testing URIs behind a proxy

Be careful when using this feature because checking every link against a
large set of regular expressions has a performance impact. Also there are no
constraints on the URI mapping, so the rules might contradict with each
other.
Remap rules get applied in order of definition to every input URI.
2022-05-29 21:41:22 +02:00
Matthias
363b95fe5f
Add support for excluding paths from link checking (#623)
This change deprecates `--exclude-file` as it was ambiguous.
Instead, `--exclude-path` was introduced to support excluding paths
to files and directories that should not be checked.
Furthermore, `.lycheeignore` is now the only way
to exclude URL patterns.
2022-05-29 17:27:09 +02:00
Matthias
8c0a32d81d
Refactor response formatting (#599)
* Add support for raw formatter (no color)
* Introduce ResponseFormatter trait
* Pass the same params to every cli command
* Update dependencies
* Remove pretty_assertions dependency (latest version doesn't build)
2022-04-25 19:19:36 +02:00
Matthias
743d386252
Allow input URLs without scheme (fixes #567)
This requires `Input::new` to return a `Result`, because the URL
parsing could fail when prepending `http://`.

We use http instead of https, because curl does as well:
70ac27604a/lib/urlapi.c (L1104-L1124)
Missing files will be interpreted as URLs from the command line
and these can be invalid, but that's not seen as an error anymore.
2022-03-27 01:27:27 +01:00
Matthias
d616177a99
Implement excluding code blocks (#523)
This is done in the extractor to avoid unnecessary
allocations.
2022-03-26 10:42:56 +01:00
Matthias
05bd3817ee
Make retry wait time configurable (#525) 2022-02-24 12:24:57 +01:00
Matthias
812663d832
Prevent flaky tests (#514)
Move from example.org to example.com, which seems to be more permissive for testing
2022-02-18 10:29:49 +01:00
Lucius Hu
6d56c6b55c
Replace plain String with SecretString for GitHub token (#509)
This commit changed the type of `lychee-lib::ClientBuilder::github_token` from
`String` to `secrecy::SecretString` to fortify the secret management within our
program.

Note that this won't affect TOML configuration of `lychee-bin` because
`serde::Deserialize` is still implemented for `SecretString`.
2022-02-13 13:53:46 +01:00
Matthias
9d738fb3f5
Fix default config (#491)
The default configuration was broken since the
introduction of caching and specifically `max_cache_age`.
This fixes deserialization and config merging for
the case where this key is missing from the config.
2022-02-07 23:17:50 +01:00
Lucius Hu
6bf8c1fe39
lychee-bin: replace lazy_static by const_format (#495)
This commit replaced the use of `lazy_static` by
`const_format` in `lychee-bin`.

Currently `lazy_static` is used to generate static
String at runtime. With `const_format` we can instead
make constant String at compile time.

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2022-02-07 22:45:17 +01:00
Matthias
4630216c30 Add description for max-cache-age flag 2022-01-14 16:55:56 +01:00
Matthias
ac490f9c53
Add caching functionality (v2) (#443)
A while ago, caching was removed due to some issues (see #349).
This is a new implementation with the following improvements:

 * Architecture: The new implementation is decoupled from the collector, which was a major issue in the last version.    Now the collector has a single responsibility: collecting links. This also avoids race-conditions when running multiple collect_links instances, which probably was an issue before.
* Performance: Uses DashMap under the hood, which was noticeably faster than Mutex<HashMap> in my tests.
* Simplicity: The cache format is a CSV file with two columns: URI and status. I decided to create a new struct called CacheStatus for serialization, because trying to serialize the error kinds in Status turned out to be a bit of a nightmare and at this point I don't think it's worth the pain (and probably isn't idiomatic either).

This is an optional feature. Caching only gets used if the `--cache` flag is set.
2022-01-14 15:25:51 +01:00
Matthias
21f3160b71
Make retries configurable; align constants (#446)
Using the same default values for the library and the
binary now but tweaked the values a bit for slightly faster performance.
2022-01-07 01:03:10 +01:00
Matthias
5eb062cbec Always hide GH token in opts 2022-01-06 09:54:03 +01:00
Matthias
166c86c30e
Use tokenizer for extraction; add benchmark (#424)
This avoids creating a DOM tree for link extraction and instead uses a `TokenSink` for on-the-fly extraction. In hyperfine benchmarks it was about 10-25% faster than the master.

Old: 4.557 s ± 0.404 s
New: 3.832 s ± 0.131 s

The performance fluctuates a little less as well.

Some missing element/attribute pairs were also added, which contain links according to the HTML spec. These occur very rarely, but it's good to parse them for completeness' sake.

Furthermore tried to clean up a lot of papercuts around our types. We now differentiate between a `RawUri` (stringy-types) and a Uri, which is a properly parsed `URI` type.
The extractor now only deals with extracting `RawUri`s while the collector creates the request objects.
2021-12-16 18:45:52 +01:00
Matthias
591cbdbebb
Add support for .lycheeignore file #308 (#402)
This is similar to files like .gitignore and .dockerignore
and gets merged into exclude_files
2021-11-23 01:39:53 +01:00
Matthias
b97fda34d0
Add support for different output formats (compact, detailed, markdown) (#375) 2021-11-18 00:44:48 +01:00
MichaIng
b648b5e914
Imply "localhost" when loopback IPs are excluded (#351)
as "localhost" is usually mapped via "hosts" file to a loopback IP address.

Resolves: https://github.com/lycheeverse/lychee/issues/319

Signed-off-by: MichaIng <micha@dietpi.com>
2021-10-06 11:33:23 +02:00
Matthias
712bdfa8cb
Make inputs required (show help if not provided) (#329) 2021-09-16 16:40:38 +02:00
Matthias
f3fe46a4d6 Merge branch 'master' of github.com:lycheeverse/lychee into local-files 2021-09-08 00:35:41 +02:00
Paweł Romanowski
8fd34a7367
Add no check (dump links only) flag (#99) 2021-09-06 16:10:48 +02:00