Commit graph

662 commits

Author SHA1 Message Date
katrinafyi
71e77f6255
fix comment for ErrorKind::InvalidFragment (#1775)
the comment doesn't make sense and it is identical to InvalidFilePath
which is right above it, so I reason this was a copy/paste mistake.
2025-07-27 14:17:49 +02:00
Thomas Zahner
c68e15fba3 Revert to previous behaviour: linking to directories results in Status::Ok(StatusCode::OK) 2025-07-26 17:33:02 +02:00
Thomas Zahner
8b70abc89b Make excluded_paths part of Collector instead of Input 2025-07-18 16:53:08 +02:00
Thomas Zahner
4bdf962698 Minor improvements 2025-07-18 16:53:08 +02:00
Thomas Zahner
23fbd0b0d5 Make regex field in RegexFilter private 2025-07-18 16:53:08 +02:00
Thomas Zahner
475d7f3d3a Apply clippy suggestions 2025-07-18 16:53:08 +02:00
Thomas Zahner
5036ce8388 Update flag description & clean up 2025-07-18 16:53:08 +02:00
Thomas Zahner
002fa49f29 Replace Vec<PathBuf> with dedicated PathExcludes type 2025-07-18 16:53:08 +02:00
Thomas Zahner
1de218a78a Unwrap option type 2025-07-18 16:53:08 +02:00
Keming
696a7cafc8
fix: do not check the fragment when http response err but accepted (#1763)
Signed-off-by: Keming <kemingy94@gmail.com>
2025-07-10 06:32:15 +02:00
MichaIng
92a9bca23f
feat: skip fragment checking for unsupported MIME types (#1744)
* feat: skip fragment checking for unsupported MIME types

The remote URL/website checker currently passes all URLs with fragments to the fragment checker as HTML document, even if it is a different or unsupported MIME type. This can cause false fragment checking for Markdown documents, failures for other MIME types, especially binaries, and unnecessary traffic for large downloads, which are always finished completely, if the fragment checker is invoked.

This commit checks the Content-Type header of the response:
- Only if it is `text/html`, it is passed to the fragment checker as HTML type.
- Only if it is `text/markdown`, of `text/plain` and URL path ends on `.md`, it is passed to the fragment checker as Markdown type.
- In all other cases, the fragment checker is skipped and the HTTP status is returned.

To invoke the fragment checker with a variable document type, a new `FileType` argument is added to the `check_html_fragment()` function.

The fragment checker test and fixture are adjusted to match the expected result: checking a binary file via remote URL with fragment is now expected to succeed, since its Content-Type header does not invoke the fragment checker anymore.

Signed-off-by: MichaIng <micha@dietpi.com>

* Update fixtures/fragments/file1.md

Co-authored-by: MichaIng <micha@dietpi.com>

---------

Signed-off-by: MichaIng <micha@dietpi.com>
Co-authored-by: Matthias Endler <matthias@endler.dev>
2025-07-06 10:46:06 +02:00
Keming
02f6f5cb49
feat: add 'user-content-' prefix to support github markdown fragment (#1750) 2025-07-04 22:58:47 +02:00
ocavue
81f2605118
fix: treat a fragment in an empty directory as an error (#1756)
* fix: treat a fragment in an empty directory as an error
* test: add more fragment tests
2025-07-04 10:25:57 +02:00
ocavue
6bcb37c2dc
fix: resolve index file inside a directory (#1752) 2025-07-03 16:55:57 +02:00
Thomas Zahner
845f74bab0
Fix basic auth (#1748)
* Capture bug as failing test

* Add basic auth credentials for website extraction requests via RequestChain & remove headers from Input

* Create UrlExtractor and add back headers

* Improve UrlExtractor

* Fix bug: extend headers instead of setting them

* Clean up

* Minor adjustments

* Apply suggestions from code review

Co-authored-by: Matthias Endler <matthias@endler.dev>

* Mention in doc comment how the method might panic

* Remove use of chain for more simplicity

---------

Co-authored-by: Matthias Endler <matthias@endler.dev>
2025-07-03 13:45:30 +02:00
Thomas Zahner
8f2f746bf9
Migrate to Clippy 1.88 (#1749)
* Update flake
* Fix clippy's new suggestions
* Do not ignore tests any longer since they work by now
* Add ignore reason
2025-06-27 12:34:48 +02:00
MichaIng
b970256248
fix: skip fragment check if website URL doesn't contain fragment (#1733)
* fix: skip fragment check if website URL doesn't contain fragment

Signed-off-by: MichaIng <micha@dietpi.com>

* test: add tests for fragment checks with binary data

Signed-off-by: MichaIng <micha@dietpi.com>

* fix: skip fragment checking as well if fragment is empty

`is_some()` is true as well if the fragment is given but empty, i.e. `#`. While it is an edge case, skip the fragment checker as well in case of an empty fragment.

Signed-off-by: MichaIng <micha@dietpi.com>

* test: switch to lycheeverse/master remote URLs

Signed-off-by: MichaIng <micha@dietpi.com>

* fix: apply rustfmt annotation

Signed-off-by: MichaIng <micha@dietpi.com>

---------

Signed-off-by: MichaIng <micha@dietpi.com>
2025-06-20 17:47:35 +02:00
tooomm
83fe1248c4
Add xml schema found in xsd files to list of exclusions (#1735)
* Add xml schema found in xsd files
See e.g. https://www.w3schools.com/xml/schema_intro.asp
* escape dots in urls
2025-06-20 15:38:49 +02:00
Matthias Endler
3592972d64
chore: release v0.19.1 (#1726) 2025-06-16 14:56:32 +03:00
Keming
d512262ffa
fix: skip the fragment check if the uri doesn't contain fragment (#1730)
Signed-off-by: Keming <kemingy94@gmail.com>
2025-06-16 14:46:01 +03:00
Thomas Zahner
469ccd0089 Update changelog 2025-06-11 16:44:23 +02:00
Matthias Endler
639c74e392 chore: release v0.19.0 2025-06-11 16:04:34 +02:00
Thomas Zahner
f0a4b3a5a8 Add explanation 2025-06-11 11:19:51 +02:00
Thomas Zahner
4a8587665f Fix grammar
Co-authored-by: Matthias Endler <matthias@endler.dev>
2025-06-11 11:19:51 +02:00
Thomas Zahner
622bc6956f Update docs 2025-06-11 11:19:51 +02:00
Thomas Zahner
326f119e38 Extract DEFAULT_ACCEPTED_STATUS_CODES & apply clippy's suggestions 2025-06-11 11:19:51 +02:00
Thomas Zahner
74961d2470 Use StatusCodeSelector default as default accepted StatusCodes 2025-06-11 11:19:51 +02:00
Thomas Zahner
c2a0908747 Tiny improvements 2025-06-11 11:19:51 +02:00
Thomas Zahner
286ff50612 Remove dbg macro 2025-06-11 11:19:51 +02:00
Thomas Zahner
a0d078b4d8 Pass accepted values by reference 2025-06-11 11:19:51 +02:00
Thomas Zahner
3100fb2ee7 Make accepted codes non-optional 2025-06-11 11:19:51 +02:00
Thomas Zahner
d22d1888f1 Handle rejected TOO_MANY_REQUESTS 2025-06-11 11:19:51 +02:00
Thomas Zahner
a516461df6 Update Status::code 2025-06-11 11:19:51 +02:00
Thomas Zahner
54bbc080a9 Remove duplicated information from output 2025-06-11 11:19:51 +02:00
Thomas Zahner
f067b92a58 Change usage of ErrorKind::NetworkRequest, as it no longer represents rejected status codes 2025-06-11 11:19:51 +02:00
Thomas Zahner
341f75e11b Update doc comment 2025-06-11 11:19:51 +02:00
Thomas Zahner
2ca69f4407 Make error message more user-friendly 2025-06-11 11:19:51 +02:00
Thomas Zahner
4e5043a3c3 Remove hardcoded rule for handling erroneous status codes differently 2025-06-11 11:19:51 +02:00
Thomas Zahner
31b2525a8d
Move archive functionality to library (#1720)
* Bump flake 1.83.0 -> 1.87.0
* Move archive functionality into lychee-lib
* Create example, update name and docs
* Split function & update tests
* Remove trailing slashes in API calls & update tests
* Apply lint suggestions
* Rename function
* Move module
* Add cargo-nextest to devShell to support 'make test'
2025-06-06 22:24:10 +02:00
dependabot[bot]
1305bccac3
Bump the dependencies group across 1 directory with 3 updates (#1714)
* Bump the dependencies group across 1 directory with 3 updates

Bumps the dependencies group with 3 updates in the / directory: [tokio](https://github.com/tokio-rs/tokio), [uuid](https://github.com/uuid-rs/uuid) and [criterion](https://github.com/bheisler/criterion.rs).


Updates `tokio` from 1.45.0 to 1.45.1
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.45.0...tokio-1.45.1)

Updates `uuid` from 1.16.0 to 1.17.0
- [Release notes](https://github.com/uuid-rs/uuid/releases)
- [Commits](https://github.com/uuid-rs/uuid/compare/v1.16.0...v1.17.0)

Updates `criterion` from 0.5.1 to 0.6.0
- [Changelog](https://github.com/bheisler/criterion.rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/bheisler/criterion.rs/compare/0.5.1...0.6.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.45.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: uuid
  dependency-version: 1.17.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: criterion
  dependency-version: 0.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* Use `std::hint::black_box`

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias@endler.dev>
2025-05-26 23:34:44 +02:00
Keming
1c97f26aa2
feat: respect the disabled property for stylesheet links (#1716)
Signed-off-by: Keming <kemingy94@gmail.com>
2025-05-25 13:13:22 +02:00
Jakob
63cdb70e6d
Upgrade to 2024 edition (#1711)
* Upgrade to 2024 edition

* Revert expr_2021 -> expr

* resolve merge conflicts

* make lint happy
2025-05-24 18:23:23 +02:00
Keming
208fa80aa6
fix: only check the fragment when it's a file (#1713)
* fix: only check the fragment when it's a file
* add dir fragment test
* Clean up unused fragment_check in Client

---------

Signed-off-by: Keming <kemingy94@gmail.com>
Co-authored-by: Matthias <matthias@endler.dev>
2025-05-23 21:50:26 +02:00
Jakob
7d9d79791a
fix: ignore gitlab table of content in wikilinks (#1710)
* fix: ignore gitlab table of content in wikilinks

* Simplify test

---------

Co-authored-by: Matthias Endler <matthias@endler.dev>
2025-05-23 15:04:24 +02:00
Matthias Endler
35610764a1
Add support for custom headers in input processing (#1561) 2025-05-23 13:37:32 +02:00
Matthias Endler
00118965bd
Fix lints (#1705) 2025-05-17 21:12:38 +02:00
Keming
1ed357fe73
feat: detect website fragments (#1675)
Signed-off-by: Keming <kemingy94@gmail.com>
2025-05-14 01:52:08 +02:00
Jakob
3a0922757e
detect wikilinks, prevent plaintext extraction from links #1650 (#1679) 2025-05-12 23:06:51 +02:00
dependabot[bot]
a5cf40cbd4 Bump the dependencies group with 2 updates
Bumps the dependencies group with 2 updates: [clap](https://github.com/clap-rs/clap) and [tempfile](https://github.com/Stebalien/tempfile).


Updates `clap` from 4.5.37 to 4.5.38
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/clap_complete-v4.5.37...clap_complete-v4.5.38)

Updates `tempfile` from 3.19.1 to 3.20.0
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Stebalien/tempfile/compare/v3.19.1...v3.20.0)

---
updated-dependencies:
- dependency-name: clap
  dependency-version: 4.5.38
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: tempfile
  dependency-version: 3.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-12 16:01:57 +02:00
Hugo McNally
fdf105c67a
Add TLS version option (#1655)
* Add a minimum TLS option
* Update help message for min tls version
2025-05-10 12:59:55 +02:00