Commit graph

99 commits

Author SHA1 Message Date
Thomas Zahner
08dabb06b2 Add regression test 2025-07-26 17:33:02 +02:00
Thomas Zahner
e743ea3f5f Improve test 2025-07-18 16:53:08 +02:00
Thomas Zahner
678acd9760 Test regex functionality in --exclude-path flag 2025-07-18 16:53:08 +02:00
Thomas Zahner
5036ce8388 Update flag description & clean up 2025-07-18 16:53:08 +02:00
Keming
696a7cafc8
fix: do not check the fragment when http response err but accepted (#1763)
Signed-off-by: Keming <kemingy94@gmail.com>
2025-07-10 06:32:15 +02:00
MichaIng
92a9bca23f
feat: skip fragment checking for unsupported MIME types (#1744)
* feat: skip fragment checking for unsupported MIME types

The remote URL/website checker currently passes all URLs with fragments to the fragment checker as HTML document, even if it is a different or unsupported MIME type. This can cause false fragment checking for Markdown documents, failures for other MIME types, especially binaries, and unnecessary traffic for large downloads, which are always finished completely, if the fragment checker is invoked.

This commit checks the Content-Type header of the response:
- Only if it is `text/html`, it is passed to the fragment checker as HTML type.
- Only if it is `text/markdown`, of `text/plain` and URL path ends on `.md`, it is passed to the fragment checker as Markdown type.
- In all other cases, the fragment checker is skipped and the HTTP status is returned.

To invoke the fragment checker with a variable document type, a new `FileType` argument is added to the `check_html_fragment()` function.

The fragment checker test and fixture are adjusted to match the expected result: checking a binary file via remote URL with fragment is now expected to succeed, since its Content-Type header does not invoke the fragment checker anymore.

Signed-off-by: MichaIng <micha@dietpi.com>

* Update fixtures/fragments/file1.md

Co-authored-by: MichaIng <micha@dietpi.com>

---------

Signed-off-by: MichaIng <micha@dietpi.com>
Co-authored-by: Matthias Endler <matthias@endler.dev>
2025-07-06 10:46:06 +02:00
Keming
02f6f5cb49
feat: add 'user-content-' prefix to support github markdown fragment (#1750) 2025-07-04 22:58:47 +02:00
ocavue
81f2605118
fix: treat a fragment in an empty directory as an error (#1756)
* fix: treat a fragment in an empty directory as an error
* test: add more fragment tests
2025-07-04 10:25:57 +02:00
ocavue
6bcb37c2dc
fix: resolve index file inside a directory (#1752) 2025-07-03 16:55:57 +02:00
MichaIng
b970256248
fix: skip fragment check if website URL doesn't contain fragment (#1733)
* fix: skip fragment check if website URL doesn't contain fragment

Signed-off-by: MichaIng <micha@dietpi.com>

* test: add tests for fragment checks with binary data

Signed-off-by: MichaIng <micha@dietpi.com>

* fix: skip fragment checking as well if fragment is empty

`is_some()` is true as well if the fragment is given but empty, i.e. `#`. While it is an edge case, skip the fragment checker as well in case of an empty fragment.

Signed-off-by: MichaIng <micha@dietpi.com>

* test: switch to lycheeverse/master remote URLs

Signed-off-by: MichaIng <micha@dietpi.com>

* fix: apply rustfmt annotation

Signed-off-by: MichaIng <micha@dietpi.com>

---------

Signed-off-by: MichaIng <micha@dietpi.com>
2025-06-20 17:47:35 +02:00
Keming
b128b86a48
feat: raise error when the default config file is invalid (#1715)
Signed-off-by: Keming <kemingy94@gmail.com>
2025-05-25 13:10:58 +02:00
Keming
208fa80aa6
fix: only check the fragment when it's a file (#1713)
* fix: only check the fragment when it's a file
* add dir fragment test
* Clean up unused fragment_check in Client

---------

Signed-off-by: Keming <kemingy94@gmail.com>
Co-authored-by: Matthias <matthias@endler.dev>
2025-05-23 21:50:26 +02:00
Matthias Endler
35610764a1
Add support for custom headers in input processing (#1561) 2025-05-23 13:37:32 +02:00
Keming
1ed357fe73
feat: detect website fragments (#1675)
Signed-off-by: Keming <kemingy94@gmail.com>
2025-05-14 01:52:08 +02:00
Matthias Endler
d33b7554a1
test: add tests for URL extraction ending with a period (#1641) 2025-02-24 08:48:58 +01:00
Ben
d6bbf85145
renamed base to base_url (fixes #1607) (#1629)
* renamed `base` to `base_url` (fixes #1607)
* fixed readme
* added warning for deprecated `--base`
* Update lychee.example.toml
* Update fixtures/configs/smoketest.toml
2025-02-16 01:41:32 +01:00
MichaIng
d3d7f6a56b
fix: do not fail on empty # and #top fragments (#1609)
The empty "#" and "#top" fragments are always valid without related HTML element. Browser will scroll to the top of the page. Hence lychee must not fail on those.

Credits go to @thiru-appitap for initial attempt and helping to find missing parts of the implementation.

Solves: https://github.com/lycheeverse/lychee/issues/1599

Signed-off-by: MichaIng <micha@dietpi.com>
2025-02-06 15:09:59 +01:00
Trask Stalnaker
6d0e94c799
Introduce --root-dir (#1576)
* windows

* Introduce --root-path

* lint

* lint

* Simplification

* Add unit tests

* Add integration test

* Sync docs

* Add missing comment to make CI happy

* Revert one of the Windows-specific changes because causing a test failure

* Support both options at the same time

* Revert a comment change that is no longer applicable

* Remove unused code

* Fix and simplification

* Integration test both at the same time

* Unit tests both at the same time

* Remove now redundant comment

* Revert windows-specific change, seems not needed after recent changes

* Use Collector::default()

* extract method and unit tests

* clippy

* clippy: &Option<A> -> Option<&A>

* Remove outdated comment

* Rename --root-path to --root-dir

* Restrict --root-dir to absolute paths for now

* Move root dir check
2024-12-13 14:36:33 +01:00
Matthias Endler
71564344de
Fix: Bring back error output for links (#1553)
With the last lychee release, we simplified the status output for links.

While this reduced the visual noise, it also accidentally caused the source of errors to not be printed anymore. This change brings back the additional error information as part of the final report output. Furthermore, it shows the error information in the progress output if verbose mode is activated.

Fixes #1487
2024-11-07 00:22:50 +01:00
autoantwort
98015907f2
Ignore casing when processing markdown fragments + check for percent encoded ancors (#1535)
We must also check the fragment before it is percent-decoded as required by the HTML standard.

Fixes https://github.com/lycheeverse/lychee/issues/1467
2024-10-28 09:21:13 +01:00
Matthias Endler
812941c2aa
Fix format option in configuration file (#1547) 2024-10-27 02:17:00 +02:00
Matthias Endler
e43086c2e9
Fix skipping of email addresses in stylesheets (#1546) 2024-10-27 01:32:11 +02:00
Matthias Endler
3094bbca33
Add support for relative links (#1489)
This commit introduces several improvements to the file checking process and URI handling:

- Extract file checking logic into separate `Checker` structs (`FileChecker`, `WebsiteChecker`, `MailChecker`)
- Improve handling of relative and absolute file paths
- Enhance URI parsing and creation from file paths
- Refactor `create_request` function for better clarity and error handling

These changes provide better support for resolving relative links, handling different base URLs, and working with file paths.

Fixes https://github.com/lycheeverse/lychee/issues/1296 and https://github.com/lycheeverse/lychee/issues/1480
2024-10-26 04:07:37 +02:00
Thomas Zahner
462033a294 Test ignored files 2024-09-22 19:09:35 +02:00
Thomas Zahner
0e9b6532d2 Test hidden files 2024-09-22 19:09:35 +02:00
Johannes Schindelin
8c6eee9b5f
Add a way to handle "pretty URLs", i.e. URIs without .html extension (#1422)
In many circumstances (GitHub Pages, Apache configured with MultiViews,
etc), web servers process URIs by appending the `.html` file extension
when no file is found at the path specified by the URI but a `.html`
file corresponding to that path _is_ found.

To allow Lychee to use the fast, offline method of checking such files
locally via the `file://` scheme, let's handle this scenario gracefully
by adding the `--fallback-extensions=html` option.

Note: This new option can take a list of file extensions to use; The
first one for which a corresponding file is found is then used.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-06-11 16:11:24 +02:00
John Bampton
0956ec6c38
Fix spelling and remove unneeded trailing whitespace (#1417) 2024-04-26 08:22:44 +02:00
Hugo McNally
9ff4a838ce
Fixed fragment generation for headings with inline code (#1370)
* Added code headings to fragment cli test

* Fixed fragment generation for headings with inline code
2024-02-05 01:07:56 +01:00
Norbert Kamiński
2a95944ef5
status.rs: Make json output more verbose (#1367)
* status.rs: Make json output more verbose

Currently if the status response has no status code, json output
contains only a text field which gives no real information about
the cause of the problem. The patch adds field with more detailed
information when the status response contains some details.

Signed-off-by: Norbert Kamiński <norbert.kaminski@3mdeb.com>

* cli.rs: Test parsing of error details in JSON format

Some network error such as SSL has no status code but it can be
identified by error status details. This patch adds a test case to
verify if the error details are parsed properly in the json format.

Signed-off-by: Norbert Kamiński <norbert.kaminski@3mdeb.com>

---------

Signed-off-by: Norbert Kamiński <norbert.kaminski@3mdeb.com>
2024-01-30 23:58:18 +01:00
Matthias
f933656161 Add integration test for accept (int and string) 2024-01-10 00:10:22 +01:00
Matthias Endler
63ba63f7c9
Exclude example TLDs from RFC 2606 (#1335)
Fixes https://github.com/lycheeverse/lychee/issues/1283
2024-01-05 18:48:15 +01:00
Hugo McNally
c9b707ea74
Decode percent escapes in fragments (#1275)
* Added test to check a fragment with a utf8 character
2024-01-05 15:46:09 +01:00
Matthias Endler
ef4e19268a
Fix false-positive example domains (#1316) 2023-12-04 01:55:14 +01:00
Techassi
1b1fd0c707
feat: Add support for ranges in the --accept option / config field (#1167)
Adds support for accept ranges discussed in #1157. This allows the user to specify custom HTTP status codes accepted during checking and thus will report as valid (not broken). The accept option only supports specifying status codes as a comma-separated list. With this PR, the option will accept a list of status code ranges formatted like this:

```toml
accept = ["100..=103", "200..=299", "403"]
```

These combinations will be supported: `..<end>`, ` ..=<end>`, `<start>..<end>` and `<start>..=<end>`.
The behavior is copied from the Rust Range like concepts:

```
    ..<end>, includes 0 to <end> (exclusive)
    ..=<end>, includes 0 to <end> (inclusive)
    <start>..<end>, includes <start> to <end> (exclusive)
    <start>..=<end>, includes <start> to <end> (inclusive)
```


- Foundation and enhancements for accept ranges, including support for comma-separated strings and integration into the CLI.
- Implementations and updates for AcceptSelector, including Default, Display, and serde defaults.
- Address and fix various errors: clippy, cargo fmt, and tests.
- Add more tests, address edge cases, and enhance error messaging, especially for TOML config parsing.
- Update dependencies.
2023-09-17 21:39:01 +02:00
Matthias Endler
0711112841
Mention supported schemes (#1255)
Fixes https://github.com/lycheeverse/lycheeverse.github.io/issues/7
2023-09-15 01:27:44 +02:00
Hugo McNally
f59aa61ee3
Check fragments in HTML files (#1198)
* Added html5gum based fragment extractor
* Markdown fragment extractor now extracts fragments from inline html
* Added fragment checks for html file
* Added inline html and html document to fragment checks test
* Improved some comments
* Improved documentation of markdown's fragment extractor.
2023-08-22 16:44:45 +02:00
Hugo McNally
8e6369377c
Introduce fragment checking for links to markdown files. (#1126)
- Implemented enhancements to include fragments in file links
- Checked links to markdown files with fragments, generating unique kebab case and heading attributes.
- Made code more idiomatic and added an integration test.
- Updated documentation.
- Fixed issues with heading attributes fragments and ensured proper handling of file errors.
2023-07-31 16:04:00 +02:00
Matthias Endler
04887ee293
Make checking email addresses optional (#1171)
E-Mail checks cause too many false-postives,
so we put them behind a flag.

* `--exclude-mail` is deprecated (to be removed in 1.0)
* `--include-mail` is the new flag

This PR also removes the obsolete tests for `--exclude-file`, which was superseded by `.lycheeignore`.

Fixes #1089
2023-07-19 19:58:38 +02:00
Techassi
f53619a455
feat: Add support for --dump-inputs (#1159)
* Add support for --dump-inputs
* Add integration tests
* Fix usage guide in README
2023-07-16 18:08:14 +02:00
Matthias Endler
97573123ef
Extend remap feature (#1133)
* wip

* Extend support for remapping

This adds supports for partial remaps and
capture groups to the remap feature.

Fixes #1129
2023-07-05 15:05:19 +02:00
Techassi
67af7ef6d3
feat: add support for basic auth per URI (#1110)
* Add support for basic auth per domain
* Move URI matching to link collection phase
* Allow AsRef for BasicAuthExtractor::new to avoid clone
* Add tests

---------

Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-06-26 12:06:24 +02:00
Thomas Zahner
130fa21a6a
Concurrent archives (#1027) 2023-05-11 20:20:27 +02:00
Matthias Endler
55797071b0
Fix nested URL extraction in verbatim elements (#988)
Skipping URLs in verbatim elements didn't take nested
elements into consideration, which were not verbatim.

For instance, the following HTML snippet would yield
`https://example.com` in non-verbatim mode, even if
it is nested inside a verbatim `<pre>` element:

```html
<pre><a href="https://example.com">link</a></pre>
```

This commit fixes the behavior for both `html5gum` and
`html5ever`.

Note that nested verbatim elements of the same kind
still are not handled correctly.

For instance,  the following HTML snippet would still yield
`https://example.com`:

```html
<pre>
  <pre></pre>
  <a href="https://example.com">link</a>
</pre>
```

The reason is that we currently only keep track of a single
verbatim element and not a stack of elements, which we
would need to unwind and resolve the situation.

Fixes https://github.com/lycheeverse/lychee/issues/986.
2023-03-11 15:18:25 +01:00
Matthias
c9edb7f809 Split up quirks and skip twitter check
It's flaky on Github
2023-03-03 12:13:09 +01:00
Matthias
08466ad59b Ignore config smoketest output report file 2023-03-03 12:13:09 +01:00
Matthias
86f13609e6 Put lycheecache tests into separate subfolders to avoid race 2023-03-03 12:13:09 +01:00
Matthias
388bd20673 Fix tests after address is no longer a verbatim element 2023-03-03 12:13:09 +01:00
Matthias Endler
7874195bbb
Customize verbosity (#956) 2023-02-24 23:53:09 +01:00
Matthias Endler
5654b7c317
Harden URL detection and extend verbatim elements (#899)
Previously remote URLs were incorrectly detected because the
string representation of a path is different than the path itself,
causing the `http` prefix match to be insufficient.

This resulted in unexpected side-effects, such as the
incorrect detection of verbatim mode for remote URLs.

The check now got improved and unit tests were added to avoid
future breakage. On top of that, missing verbatim elements were added
2023-01-04 00:38:19 +01:00
Matthias
982d978e47
Add different verbosity levels (#824)
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
2022-11-28 23:25:33 +01:00