* feat: skip fragment checking for unsupported MIME types
The remote URL/website checker currently passes all URLs with fragments to the fragment checker as HTML document, even if it is a different or unsupported MIME type. This can cause false fragment checking for Markdown documents, failures for other MIME types, especially binaries, and unnecessary traffic for large downloads, which are always finished completely, if the fragment checker is invoked.
This commit checks the Content-Type header of the response:
- Only if it is `text/html`, it is passed to the fragment checker as HTML type.
- Only if it is `text/markdown`, of `text/plain` and URL path ends on `.md`, it is passed to the fragment checker as Markdown type.
- In all other cases, the fragment checker is skipped and the HTTP status is returned.
To invoke the fragment checker with a variable document type, a new `FileType` argument is added to the `check_html_fragment()` function.
The fragment checker test and fixture are adjusted to match the expected result: checking a binary file via remote URL with fragment is now expected to succeed, since its Content-Type header does not invoke the fragment checker anymore.
Signed-off-by: MichaIng <micha@dietpi.com>
* Update fixtures/fragments/file1.md
Co-authored-by: MichaIng <micha@dietpi.com>
---------
Signed-off-by: MichaIng <micha@dietpi.com>
Co-authored-by: Matthias Endler <matthias@endler.dev>
* Capture bug as failing test
* Add basic auth credentials for website extraction requests via RequestChain & remove headers from Input
* Create UrlExtractor and add back headers
* Improve UrlExtractor
* Fix bug: extend headers instead of setting them
* Clean up
* Minor adjustments
* Apply suggestions from code review
Co-authored-by: Matthias Endler <matthias@endler.dev>
* Mention in doc comment how the method might panic
* Remove use of chain for more simplicity
---------
Co-authored-by: Matthias Endler <matthias@endler.dev>
* fix: skip fragment check if website URL doesn't contain fragment
Signed-off-by: MichaIng <micha@dietpi.com>
* test: add tests for fragment checks with binary data
Signed-off-by: MichaIng <micha@dietpi.com>
* fix: skip fragment checking as well if fragment is empty
`is_some()` is true as well if the fragment is given but empty, i.e. `#`. While it is an edge case, skip the fragment checker as well in case of an empty fragment.
Signed-off-by: MichaIng <micha@dietpi.com>
* test: switch to lycheeverse/master remote URLs
Signed-off-by: MichaIng <micha@dietpi.com>
* fix: apply rustfmt annotation
Signed-off-by: MichaIng <micha@dietpi.com>
---------
Signed-off-by: MichaIng <micha@dietpi.com>
* fix: only check the fragment when it's a file
* add dir fragment test
* Clean up unused fragment_check in Client
---------
Signed-off-by: Keming <kemingy94@gmail.com>
Co-authored-by: Matthias <matthias@endler.dev>
The empty "#" and "#top" fragments are always valid without related HTML element. Browser will scroll to the top of the page. Hence lychee must not fail on those.
Credits go to @thiru-appitap for initial attempt and helping to find missing parts of the implementation.
Solves: https://github.com/lycheeverse/lychee/issues/1599
Signed-off-by: MichaIng <micha@dietpi.com>
* windows
* Introduce --root-path
* lint
* lint
* Simplification
* Add unit tests
* Add integration test
* Sync docs
* Add missing comment to make CI happy
* Revert one of the Windows-specific changes because causing a test failure
* Support both options at the same time
* Revert a comment change that is no longer applicable
* Remove unused code
* Fix and simplification
* Integration test both at the same time
* Unit tests both at the same time
* Remove now redundant comment
* Revert windows-specific change, seems not needed after recent changes
* Use Collector::default()
* extract method and unit tests
* clippy
* clippy: &Option<A> -> Option<&A>
* Remove outdated comment
* Rename --root-path to --root-dir
* Restrict --root-dir to absolute paths for now
* Move root dir check
With the last lychee release, we simplified the status output for links.
While this reduced the visual noise, it also accidentally caused the source of errors to not be printed anymore. This change brings back the additional error information as part of the final report output. Furthermore, it shows the error information in the progress output if verbose mode is activated.
Fixes#1487
This commit introduces several improvements to the file checking process and URI handling:
- Extract file checking logic into separate `Checker` structs (`FileChecker`, `WebsiteChecker`, `MailChecker`)
- Improve handling of relative and absolute file paths
- Enhance URI parsing and creation from file paths
- Refactor `create_request` function for better clarity and error handling
These changes provide better support for resolving relative links, handling different base URLs, and working with file paths.
Fixes https://github.com/lycheeverse/lychee/issues/1296 and https://github.com/lycheeverse/lychee/issues/1480
This introduces an option `--cache-exclude-status`, which allows specifying a range of HTTP status codes which will be ignored from the cache.
Closes#1400.