Commit graph

85 commits

Author SHA1 Message Date
Matthias Endler
fe24ba783a
Add check duration (in seconds) to report (#1064) 2023-05-06 00:47:32 +02:00
Matthias
0c04f0371d Improve is_url function and tests 2023-04-11 01:00:25 +02:00
Matthias
e96d4114a9 helpers -> utils 2023-04-11 00:43:57 +02:00
Matthias
bbc0bfab9c format errors 2023-04-11 00:00:14 +02:00
Benny Joe Villiger
250f7a8f0a
Status codes in maps (#1014) 2023-03-27 12:29:12 +02:00
Matthias Endler
2255ad9286
Better retry handling (#981)
Previously, lychee would blindly retry all requests,
no matter if the request error was transient or fatal.

Taking a lesson from https://github.com/TrueLayer/reqwest-middleware,
we can be more granular about the error behavior.
This PR adds their retry logic to lychee, reducing the number of
unnecessary requests significantly.

I also made some ergonomic changes to the client, which should not
affect its behavior.
2023-03-10 22:36:45 +01:00
Matthias Endler
30e2a2b62b
Fix --max-redirects (#987)
Having more than the max number of redirects
caused lychee to abort the requests, but did not
lead to an error.

Related: https://github.com/lycheeverse/lychee-action/issues/164
2023-03-10 15:15:37 +01:00
Matthias
59ddc1e27d Fix url input handling without scheme 2023-03-03 12:13:09 +01:00
Matthias Endler
7874195bbb
Customize verbosity (#956) 2023-02-24 23:53:09 +01:00
Matthias Endler
b653a0a1ec
Fix cached 200 status code handling (#958)
* Fix cached 200 status code handling

Assert that code 200 never needs to be explicitly accepted for cached response
to match the behavior of uncached checks

* Bump version to v0.11.1
2023-02-23 00:25:53 +01:00
Matthias
5558531bab Fix lint 2023-02-22 21:05:49 +01:00
Matthias Endler
9837699b79
Introduce new let...else syntax (#936) 2023-01-30 14:25:30 +01:00
Lucius Hu
e2406089ad
chore!: improve client and remap modules (#913)
`lychee_lib::client`:

- Improved documentation.
- Added an log message in `ClientBuilder::client()` when provied user-agent
  overrides the one defined in provied custom header.
- Removed unnecessary error handling in `Client::check()` when setting HTTPS
  scheme because all failure cases should occur when checking this URL the first
  time already.
- Removed unnecessary error handling in `Client::remap()` since
  `lychee-lib::remap::Remaps::remap()` doesn't returns a `Result` anymore.
- Fixed potential integer overflow in `Client::check_website()` when the wait
  time between retries doubles, by using `std::time::Duration::saturating_mul`
  instead.
- Renamed `invalid()` to `validate_url()`.

`lychee_lib::remap`:

- Improved documentation, in particular, clarified (in the comment) that it's
  URLs not URIs being remapped.
- Changed `Remaps::remap()` so it takes `&mut Url` instead of `Uri` as its
  argument, and doesn't return a `Result` as a result.
    - Using `Url` instead of `Uri` because it aligns with the concept of
      remapping locations rather than identifiers.
    - Mutating the URL directly instead of returning a new one for it's more
      straightforward.
    - There is no error handling because we don't convert from URL to URI
      anymore. Furthermore, this always succeed in the first place so we never
      needed error handling.
- Added implementation of `IntoIterator` for `&'a Remaps` and convenience method
  of `Remaps::iter`. (Their mutable or moving counterparts are deliberately
  avoided because we don't want library users to modify all consume the
  remapping rules after its instantiation.)

`lychee_lib::error`:

- Renamed `ErrorKind::InvalidUriRemap` to `InvalidUrlRemap` and improved
  its error message.

Changes to other modules are minor and only serves to accompany aforementioned
changes.
2023-01-16 19:14:09 +01:00
Matthias Endler
5654b7c317
Harden URL detection and extend verbatim elements (#899)
Previously remote URLs were incorrectly detected because the
string representation of a path is different than the path itself,
causing the `http` prefix match to be insufficient.

This resulted in unexpected side-effects, such as the
incorrect detection of verbatim mode for remote URLs.

The check now got improved and unit tests were added to avoid
future breakage. On top of that, missing verbatim elements were added
2023-01-04 00:38:19 +01:00
Matthias Endler
da46734c54
Extend response stats in verbose mode (#882) 2022-12-20 10:43:01 +01:00
Matthias Endler
6df1c378ec
Fix Rust 1.66 clippy lints (#879) 2022-12-19 14:28:10 +01:00
Matthias
7d435f2155
Add more markdown extensions (#866) 2022-12-12 18:26:42 +01:00
Matthias
982d978e47
Add different verbosity levels (#824)
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
2022-11-28 23:25:33 +01:00
Matthias
b479a5810e
Allow overriding accepted status codes for cached URIs (#843)
Fixes #840
2022-11-28 12:23:07 +01:00
Matthias
765f7adb12
Don't check example mail addresses by default (#815)
This was an oversight so far that became apparent after our
recent fix for email addreses with query params
(e.g. `test@example.com?subject=test`).
The parsing of email addresses has improved and so we detect
more mail addresses, but we didn't check if they belonged
to an example domain, causing false-positive checks.
2022-11-08 23:46:32 +01:00
Matthias
d61105edbb
Fix parsing error of email addresses with query params (#809)
Email addresses with query parameters often get used in
contact forms on websites. They can also be found in
other documents like Markdown.

A common use-case is to add a subject line to the email
as a parameter e.g. `mailto:mail@example.com?subject="Hello"`.

Previously we handled such cases incorrectly by recognizing
them as files. The reason was that our email parsing was too strict
to allow for that use-case.
With `email_address` we switched to a more permissive parser.

Note that this does not affect the actual address email checking,
as this is still done `check-if-email-exists`, which has more strict
check functionality.
2022-11-05 23:40:33 +01:00
Matthias
69f387c1bd
Markdown-status (#729)
* Fix typos

* Add status code description to markdown output
2022-08-11 22:08:05 +02:00
Walter Beller-Morales
6d40a2ab7b
Update to gracefully handle nonexistent relative paths (#691)
* Update Input::new to gracefully handle nonexistent relative paths
* Add test checking Input::new can handle real relative paths
* Add better pre-conditions to Input::new tests
* Add integration tests for handling relative paths in lychee-bin
* Update lychee-lib/src/types/input.rs
2022-07-22 17:15:55 +02:00
Matthias
6fae93f2da
Skip caching unsupported and excluded URLs (#692)
As discussed in https://github.com/lycheeverse/lychee/issues/647#issuecomment-1170773449, it does not make much sense to cache unsupported
and excluded URLs.
Unsupported URLs might be supported in the future and caching them
would mean they won't get checked then. Excluded URLs were
excluded for a reason and should not appear in the cache.
Furthermore they might not be excluded
in a consecutive run, leading to a false-positive.
2022-07-17 18:40:45 +02:00
Walter Beller-Morales
9ad53f97a2
Fix deserialize of lycheecache status codes (#685)
* Add custom deserializer for `CacheStatus` to properly classify status codes
* Add CLI integration tests to check .lycheecache behavior
* Add comment to explain conflict between cache and accept flags
2022-07-15 22:45:24 +02:00
Matthias
84de43c554
Refactor request types (#637) 2022-06-03 20:13:07 +02:00
Matthias
9b4dfadffd
Fix parsing errors with config options (#632) 2022-05-31 19:43:46 +02:00
Matthias
22fecfc056
Add support for URI remapping (#620)
Remaps allow mapping from a URI pattern to a different URI.

The syntax is

```
lychee --remap 'https://example.com http://127.0.0.1'
```

Some use-cases are
- Testing URIs prior to production deployment
- Testing URIs behind a proxy

Be careful when using this feature because checking every link against a
large set of regular expressions has a performance impact. Also there are no
constraints on the URI mapping, so the rules might contradict with each
other.
Remap rules get applied in order of definition to every input URI.
2022-05-29 21:41:22 +02:00
Matthias
363b95fe5f
Add support for excluding paths from link checking (#623)
This change deprecates `--exclude-file` as it was ambiguous.
Instead, `--exclude-path` was introduced to support excluding paths
to files and directories that should not be checked.
Furthermore, `.lycheeignore` is now the only way
to exclude URL patterns.
2022-05-29 17:27:09 +02:00
Matthias
8c0a32d81d
Refactor response formatting (#599)
* Add support for raw formatter (no color)
* Introduce ResponseFormatter trait
* Pass the same params to every cli command
* Update dependencies
* Remove pretty_assertions dependency (latest version doesn't build)
2022-04-25 19:19:36 +02:00
Matthias
03d28820bb
Extract more status information from reqwest (#577)
Recently we cleaned up the commandline output to trim away redundant
information like the URL, which occured twice.
Unfortunately we also removed helpful information from reqwest, which
could support the user in troubleshooting unexpected errors.

This commit reverts that.
We now extract the meaningful information from reqwest, without being
too verbose. For that we have to depend on the string output for the
reqwest error, but it's better than hiding that information from the user.
It is fragile as it depends on the reqwest internals, but in the worst case
we simply return the full error text in case our parsing won't work.
2022-04-02 14:37:03 +02:00
Matthias
36d3195c68
Cache verbosity issue (fixes #562) 2022-03-27 14:48:09 +02:00
Matthias
743d386252
Allow input URLs without scheme (fixes #567)
This requires `Input::new` to return a `Result`, because the URL
parsing could fail when prepending `http://`.

We use http instead of https, because curl does as well:
70ac27604a/lib/urlapi.c (L1104-L1124)
Missing files will be interpreted as URLs from the command line
and these can be invalid, but that's not seen as an error anymore.
2022-03-27 01:27:27 +01:00
Matthias
e1d112dbab
Remove missing_panic_doc (#561) 2022-03-22 21:02:56 +01:00
Matthias
8097bfa408
Print Github token error once at the end (#537)
Print original reqwest error for every Github link.
It contains more information about the underlying error.

Only print a message about the Github token at the
end if it's not set and there were Github errors.
2022-03-03 10:04:55 +01:00
Matthias
4c51fce22f
Fix broken pipe error on failing writes to stdout (#535)
Make sure that broken pipes (e.g. when a reader of a
pipe prematurely exits during execution) get handled gracefully.
This change also moves some error messages to stderr by using
eprintln.

More info: https://github.com/jez/as-tree/issues/15
2022-03-02 23:39:54 +01:00
Matthias
0fc5fc9ffe
Print errors with a different format for easier clickability (fixes #532) 2022-03-01 16:58:04 +01:00
Matthias
41b291037a
Response output overhaul (#524)
Clean up the response output.
Superfluous information was removed and the formatting was changed to make
the output more readable to humans.
2022-02-23 17:28:14 +01:00
Lucius Hu
70ebe45117
Improved IPv6 filtering support (#501)
This commit uses crate `ip_network` to determine whether an IPv6 address is
link-local or unique local.

Note that this extra dependencies can be removed once rust-lang/rust#27709 is
stabilized.

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-02-22 10:39:44 +01:00
Matthias
ba276cd51b
Error cleanup (#510)
* Add more fine-grained error types; remove generic IO error
* Update error message for missing file
* Remove missing `Error` suffix
* Rename ErrorKind::Github to ErrorKind::GithubRequest for consistency with NetworkRequest
2022-02-19 01:44:00 +01:00
Matthias
812663d832
Prevent flaky tests (#514)
Move from example.org to example.com, which seems to be more permissive for testing
2022-02-18 10:29:49 +01:00
Matthias
47df7780fe
Use captured identifiers in format strings (#507)
Makes for arguably cleaner-looking code.
The downside is that the MSRV is 1.58
https://blog.rust-lang.org/2022/01/13/Rust-1.58.0.html

Given that nobody uses lychee as a library yet
and we have precompiled binaries, it's an acceptable
tradeoff.
My little research revealed that this is a much-liked
feature: https://twitter.com/matthiasendler/status/1483895557621960715
2022-02-12 10:51:52 +01:00
Lucius Hu
53c41b03d8
replace hubcaps by octocrab (#502)
This commit replaced `hubcaps` by `octocrab`, which has more downloads per month
and receives more frequent release updates.

The caveats are:

1. When instantiating the API client, `octocrab` doesn't offer you a way to
specify custom user-agent. But I would argue that, at least presently, this
doesn't seem to cause issues.
2. `octocrab` doesn't export as much details of its error types as `hubcaps`
does. So we will have fewer control on the display of the error message. But I
would also argue that this is not really important. Though we should do more
tests to make sure the error looks good enough.

* hide implementation details in error message

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2022-02-11 23:43:47 +01:00
Lucius Hu
476a048350
lychee-lib::client reworked (#500)
This commit mainly added or improved documentation for `lychee-lib::client`
module.

But it also contains a few API changes:

- `ClientBuilder::client()` now consumes itself instead of taking a reference.
  This helps to avoid a few unnecessary clones.
- `ClientBuilder::build_filter()` was a private function and is inlined to avoid
  unnecessary clones.
- Added a new crate-scoped function `Uri::set_scheme()`.

* added notes on deprecated site-local network

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2022-02-10 00:04:48 +01:00
Matthias
6635863746
Add Alpine page for benchmark; refactor code (#481) 2022-01-27 23:42:06 +01:00
Matthias
97b06230fc
Add missing Github exclusions; sort entries (#473) 2022-01-21 23:54:59 +01:00
Matthias
6e757fa20e
Add more information about mail errors (#463) 2022-01-14 22:22:53 +01:00
Matthias
994aadf6a1
Simplify error messages (#462)
Using pattern matching to make the hubcaps and reqwest error messages a little shorter and (subjectively) more readable.
2022-01-14 15:26:13 +01:00
Matthias
ac490f9c53
Add caching functionality (v2) (#443)
A while ago, caching was removed due to some issues (see #349).
This is a new implementation with the following improvements:

 * Architecture: The new implementation is decoupled from the collector, which was a major issue in the last version.    Now the collector has a single responsibility: collecting links. This also avoids race-conditions when running multiple collect_links instances, which probably was an issue before.
* Performance: Uses DashMap under the hood, which was noticeably faster than Mutex<HashMap> in my tests.
* Simplicity: The cache format is a CSV file with two columns: URI and status. I decided to create a new struct called CacheStatus for serialization, because trying to serialize the error kinds in Status turned out to be a bit of a nightmare and at this point I don't think it's worth the pain (and probably isn't idiomatic either).

This is an optional feature. Caching only gets used if the `--cache` flag is set.
2022-01-14 15:25:51 +01:00
Matthias
48c8153e11 Refactor Github checking; add docs 2022-01-12 09:25:12 +01:00