Commit graph

86 commits

Author SHA1 Message Date
Matthias
f933656161 Add integration test for accept (int and string) 2024-01-10 00:10:22 +01:00
Levi Zim
704126eab4
fix(test_cookie_jar): use google.com/ncr (#1336)
google.com might redirect to other domains, causing cookie_jar test to fail.
2024-01-06 12:31:23 +01:00
Hugo McNally
c9b707ea74
Decode percent escapes in fragments (#1275)
* Added test to check a fragment with a utf8 character
2024-01-05 15:46:09 +01:00
Matthias Endler
d3d0cd513d
Better TOML parsing error message (#1332)
The error handling for config loading was pretty poor.
That's because we didn't use the correct syntax to show the entire context with `anhow`.
See ["Display representations"](https://docs.rs/anyhow/latest/anyhow/struct.Error.html#display-representations).
2024-01-04 22:17:14 +01:00
Thomas Zahner
46f0ae908e
Address warnings of the new clippy lints (#1310) 2023-12-01 14:21:49 +01:00
Hugo McNally
f59aa61ee3
Check fragments in HTML files (#1198)
* Added html5gum based fragment extractor
* Markdown fragment extractor now extracts fragments from inline html
* Added fragment checks for html file
* Added inline html and html document to fragment checks test
* Improved some comments
* Improved documentation of markdown's fragment extractor.
2023-08-22 16:44:45 +02:00
Matthias Endler
006ee6d3be
Make suggestion test more robust (#1229) 2023-08-17 16:54:59 +02:00
Matthias Endler
1bf2944c1e
Update dependencies; fix flaky tests (#1219) 2023-08-15 16:41:58 +02:00
Hugo McNally
8e6369377c
Introduce fragment checking for links to markdown files. (#1126)
- Implemented enhancements to include fragments in file links
- Checked links to markdown files with fragments, generating unique kebab case and heading attributes.
- Made code more idiomatic and added an integration test.
- Updated documentation.
- Fixed issues with heading attributes fragments and ensured proper handling of file errors.
2023-07-31 16:04:00 +02:00
Matthias Endler
04887ee293
Make checking email addresses optional (#1171)
E-Mail checks cause too many false-postives,
so we put them behind a flag.

* `--exclude-mail` is deprecated (to be removed in 1.0)
* `--include-mail` is the new flag

This PR also removes the obsolete tests for `--exclude-file`, which was superseded by `.lycheeignore`.

Fixes #1089
2023-07-19 19:58:38 +02:00
Techassi
f53619a455
feat: Add support for --dump-inputs (#1159)
* Add support for --dump-inputs
* Add integration tests
* Fix usage guide in README
2023-07-16 18:08:14 +02:00
Matthias
961575cdc7 fix typos 2023-07-13 21:48:46 +02:00
Matthias Endler
14e748793e
Cookie Support (#1146)
This is a very conservative and limited implementation of cookie support.

The goal is to ship an MVP, which covers 80% of the use-cases.
When you run lychee with --cookie-jar cookies.json, all cookies will be stored in cookies.json, one cookie per line.
This makes cookies easy to edit by hand if needed, although this is an advanced use-case and the API for the format is not guaranteed to be stable.

Fixes: #645, #715
Partially fixes: #1108
2023-07-13 17:32:41 +02:00
Matthias Endler
40ba18794d
Don't check Twitter URLs (#1147)
Twitter completely locked down and requires
a login to read tweets. (Temporarily) disable all
Twitter URLs to avoid false-positives.

For context:
https://github.com/zedeus/nitter/issues/919
https://news.ycombinator.com/item?id=36540957
https://techcrunch.com/2023/06/30/twitter-now-requires-an-account-to-view-tweets/

Fixes https://github.com/lycheeverse/lychee/issues/1108
2023-07-13 17:31:59 +02:00
Matthias Endler
97573123ef
Extend remap feature (#1133)
* wip

* Extend support for remapping

This adds supports for partial remaps and
capture groups to the remap feature.

Fixes #1129
2023-07-05 15:05:19 +02:00
Techassi
67af7ef6d3
feat: add support for basic auth per URI (#1110)
* Add support for basic auth per domain
* Move URI matching to link collection phase
* Allow AsRef for BasicAuthExtractor::new to avoid clone
* Add tests

---------

Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-06-26 12:06:24 +02:00
Matthias Endler
5ce77e1202
Don't cache unknown status codes (#1090)
Unknown status codes should be skipped and not cached by default. The reason is that we don't know if they are valid or not and even if they are invalid, we don't know if they will be valid in the future.
2023-06-02 02:46:20 +02:00
Thomas Zahner
130fa21a6a
Concurrent archives (#1027) 2023-05-11 20:20:27 +02:00
Matthias Endler
fe24ba783a
Add check duration (in seconds) to report (#1064) 2023-05-06 00:47:32 +02:00
Matthias Endler
0e97f57040
Use standard error for error output (#990)
Fixes https://github.com/lycheeverse/lychee/issues/984

From https://doc.rust-lang.org/book/ch12-06-writing-to-stderr-instead-of-stdout.html:

> Command line programs are expected to send error messages to the standard error stream so we can still see error messages on the screen even if we redirect the standard output stream to a file. Our program is not currently well-behaved: we’re about to see that it saves the error message output to a file instead!
2023-04-11 23:43:33 +02:00
Thomas
994b2852cd
Wayback integration (#1003)
Adds support for suggesting archived URLs for broken links.
Uses Wayback Machine as the archive provider.
2023-03-28 00:45:06 +02:00
Matthias Endler
55797071b0
Fix nested URL extraction in verbatim elements (#988)
Skipping URLs in verbatim elements didn't take nested
elements into consideration, which were not verbatim.

For instance, the following HTML snippet would yield
`https://example.com` in non-verbatim mode, even if
it is nested inside a verbatim `<pre>` element:

```html
<pre><a href="https://example.com">link</a></pre>
```

This commit fixes the behavior for both `html5gum` and
`html5ever`.

Note that nested verbatim elements of the same kind
still are not handled correctly.

For instance,  the following HTML snippet would still yield
`https://example.com`:

```html
<pre>
  <pre></pre>
  <a href="https://example.com">link</a>
</pre>
```

The reason is that we currently only keep track of a single
verbatim element and not a stack of elements, which we
would need to unwind and resolve the situation.

Fixes https://github.com/lycheeverse/lychee/issues/986.
2023-03-11 15:18:25 +01:00
Matthias Endler
2255ad9286
Better retry handling (#981)
Previously, lychee would blindly retry all requests,
no matter if the request error was transient or fatal.

Taking a lesson from https://github.com/TrueLayer/reqwest-middleware,
we can be more granular about the error behavior.
This PR adds their retry logic to lychee, reducing the number of
unnecessary requests significantly.

I also made some ergonomic changes to the client, which should not
affect its behavior.
2023-03-10 22:36:45 +01:00
Matthias Endler
30e2a2b62b
Fix --max-redirects (#987)
Having more than the max number of redirects
caused lychee to abort the requests, but did not
lead to an error.

Related: https://github.com/lycheeverse/lychee-action/issues/164
2023-03-10 15:15:37 +01:00
Matthias
c9edb7f809 Split up quirks and skip twitter check
It's flaky on Github
2023-03-03 12:13:09 +01:00
Matthias
51628213d6 Explicit quirks output for test 2023-03-03 12:13:09 +01:00
Matthias
9eb3149a69 Custom config handling to spot errors when passing invalid config and ignoring errors loading missing default conf 2023-03-03 12:13:09 +01:00
Matthias
19976cd9e7 Add test for missing and example config file and helper methods 2023-03-03 12:13:09 +01:00
Matthias
86f13609e6 Put lycheecache tests into separate subfolders to avoid race 2023-03-03 12:13:09 +01:00
Matthias
17937537f8 Ignored URLs don't lead to failing exit code 2023-03-03 12:13:09 +01:00
Matthias
c549213bfe Fix test_skip_cache_unsupported 2023-03-03 12:13:09 +01:00
Matthias
f64c60aac0 Fix cache test 2023-03-03 12:13:09 +01:00
Matthias
09d0064e69 Split up test 2023-03-03 12:13:09 +01:00
Matthias
9b75da60a6 Ignore remap test
https://github.com/robinst/linkify/pull/58
2023-03-03 12:13:09 +01:00
Matthias
4306150e56 Excluded URLs are no longer cached
See https://github.com/lycheeverse/lychee/pull/692
2023-03-03 12:13:09 +01:00
Matthias
a7f7b989ef fix multiple_exclude_files test 2023-03-03 12:13:09 +01:00
Matthias
46d238ba93 fix url output 2023-03-03 12:13:09 +01:00
Matthias
e125d45a8e Multiple arguments get handled differently in clap in comparison to structopt
We should document that change
2023-03-03 12:13:09 +01:00
Matthias
1f62590a02 Fix expected json output 2023-03-03 12:13:09 +01:00
Matthias
ac13e5d16e Fix CLI tests 2023-03-03 12:13:09 +01:00
Matthias Endler
b653a0a1ec
Fix cached 200 status code handling (#958)
* Fix cached 200 status code handling

Assert that code 200 never needs to be explicitly accepted for cached response
to match the behavior of uncached checks

* Bump version to v0.11.1
2023-02-23 00:25:53 +01:00
Matthias Endler
5654b7c317
Harden URL detection and extend verbatim elements (#899)
Previously remote URLs were incorrectly detected because the
string representation of a path is different than the path itself,
causing the `http` prefix match to be insufficient.

This resulted in unexpected side-effects, such as the
incorrect detection of verbatim mode for remote URLs.

The check now got improved and unit tests were added to avoid
future breakage. On top of that, missing verbatim elements were added
2023-01-04 00:38:19 +01:00
Matthias
b479a5810e
Allow overriding accepted status codes for cached URIs (#843)
Fixes #840
2022-11-28 12:23:07 +01:00
Matthias
35ccfb87c3
Add support for dumping links to file (#810) 2022-11-08 00:33:16 +01:00
Matthias
d61105edbb
Fix parsing error of email addresses with query params (#809)
Email addresses with query parameters often get used in
contact forms on websites. They can also be found in
other documents like Markdown.

A common use-case is to add a subject line to the email
as a parameter e.g. `mailto:mail@example.com?subject="Hello"`.

Previously we handled such cases incorrectly by recognizing
them as files. The reason was that our email parsing was too strict
to allow for that use-case.
With `email_address` we switched to a more permissive parser.

Note that this does not affect the actual address email checking,
as this is still done `check-if-email-exists`, which has more strict
check functionality.
2022-11-05 23:40:33 +01:00
Matthias
a42ad4c673
Twitter quirk fixed; adjust test (#741) 2022-08-17 16:52:20 +02:00
Walter Beller-Morales
6d40a2ab7b
Update to gracefully handle nonexistent relative paths (#691)
* Update Input::new to gracefully handle nonexistent relative paths
* Add test checking Input::new can handle real relative paths
* Add better pre-conditions to Input::new tests
* Add integration tests for handling relative paths in lychee-bin
* Update lychee-lib/src/types/input.rs
2022-07-22 17:15:55 +02:00
Matthias
6fae93f2da
Skip caching unsupported and excluded URLs (#692)
As discussed in https://github.com/lycheeverse/lychee/issues/647#issuecomment-1170773449, it does not make much sense to cache unsupported
and excluded URLs.
Unsupported URLs might be supported in the future and caching them
would mean they won't get checked then. Excluded URLs were
excluded for a reason and should not appear in the cache.
Furthermore they might not be excluded
in a consecutive run, leading to a false-positive.
2022-07-17 18:40:45 +02:00
Walter Beller-Morales
9ad53f97a2
Fix deserialize of lycheecache status codes (#685)
* Add custom deserializer for `CacheStatus` to properly classify status codes
* Add CLI integration tests to check .lycheecache behavior
* Add comment to explain conflict between cache and accept flags
2022-07-15 22:45:24 +02:00
vpereira01
d48a3279a8
Improve configuration example (#631)
* Add missing parameters
* Remove deprecated `--exclude-file` parameter
* Improve TOML comments
* Add config smoketest
2022-05-31 19:05:27 +02:00