Commit graph

245 commits

Author SHA1 Message Date
Matthias Endler
913cf717a4
feat: Add tests for dns-prefetch (#1522)
* Exclude `rel=dns-prefetch` links

Resolves #1499

* Add tests for dns-prefetch

---------

Co-authored-by: wackget <136205263+wackget@users.noreply.github.com>
2024-10-12 01:45:35 +02:00
wackget
e398325bb0
Exclude rel=dns-prefetch links (#1520)
Resolves #1499
2024-10-12 01:37:54 +02:00
Matthias Endler
7014765988
Improve docs for fragment checker 2024-10-08 14:55:16 +02:00
Matthias Endler
11adc09725
Don't check preconnect links (#1187)
Preconnect links are used to establish a server connection without loading a
specific resource yet. Not always do these links point to a URL that should
return a 200, and they are not user-facing, i.e. they don't show up in the
final rendered version of a page.

Therefore, we should not check them at all; not even in `--include-verbatim`
mode, as they might not point to a valid resource.

This turned out to require a significant overhaul of the html5gum extractor
to handle random attribute ordering correctly. Changes to the html5gum extractor:

* Refactor HTML link extractor for improved performance and maintainability
- Replace Vec<u8> with String for better readability and manipulation
- Introduce Element struct to encapsulate element-related data
- Use `HashMap<String, String>` for current_attributes for efficient lookups
- Add verbatim_stack to properly handle nested verbatim elements
- Remove unsafe code where possible, using String::from_utf8_lossy
- Improve attribute handling with `HashMap` entry API and prioritize `srcset`
- Simplify logic and consolidate verbatim element handling
- Enhance encapsulation in `LinkExtractor` struct
- Improve overall performance with more efficient data structures
- Increase flexibility for future feature additions or modifications

Fixes #897
2024-10-07 22:36:16 +02:00
dependabot[bot]
b9ad685fea
Bump the dependencies group across 1 directory with 8 updates (#1509)
* Bump the dependencies group across 1 directory with 8 updates

Bumps the dependencies group with 8 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [clap](https://github.com/clap-rs/clap) | `4.5.17` | `4.5.18` |
| [regex](https://github.com/rust-lang/regex) | `1.10.6` | `1.11.0` |
| [secrecy](https://github.com/iqlusioninc/crates) | `0.8.0` | `0.10.2` |
| [tempfile](https://github.com/Stebalien/tempfile) | `3.12.0` | `3.13.0` |
| [async-trait](https://github.com/dtolnay/async-trait) | `0.1.82` | `0.1.83` |
| [octocrab](https://github.com/XAMPPRocky/octocrab) | `0.39.0` | `0.40.0` |
| [thiserror](https://github.com/dtolnay/thiserror) | `1.0.63` | `1.0.64` |
| [rstest](https://github.com/la10736/rstest) | `0.22.0` | `0.23.0` |



Updates `clap` from 4.5.17 to 4.5.18
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/clap_complete-v4.5.17...clap_complete-v4.5.18)

Updates `regex` from 1.10.6 to 1.11.0
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/regex/compare/1.10.6...1.11.0)

Updates `secrecy` from 0.8.0 to 0.10.2
- [Commits](https://github.com/iqlusioninc/crates/commits)

Updates `tempfile` from 3.12.0 to 3.13.0
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Stebalien/tempfile/compare/v3.12.0...v3.13.0)

Updates `async-trait` from 0.1.82 to 0.1.83
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.82...0.1.83)

Updates `octocrab` from 0.39.0 to 0.40.0
- [Release notes](https://github.com/XAMPPRocky/octocrab/releases)
- [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md)
- [Commits](https://github.com/XAMPPRocky/octocrab/compare/v0.39.0...v0.40.0)

Updates `thiserror` from 1.0.63 to 1.0.64
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.63...1.0.64)

Updates `rstest` from 0.22.0 to 0.23.0
- [Release notes](https://github.com/la10736/rstest/releases)
- [Changelog](https://github.com/la10736/rstest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/la10736/rstest/compare/v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: secrecy
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: tempfile
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: octocrab
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: rstest
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* Refactor personal token cloning in ClientBuilder

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias@endler.dev>
2024-10-01 13:10:00 +02:00
Thomas Zahner
99148af84a Disable pedantic clippy lint 2024-09-22 19:09:35 +02:00
Thomas Zahner
6075b4c87e Skip ignored and hidden files by default 2024-09-22 19:09:35 +02:00
Thomas Zahner
cf1420817f Remove is_symlink check, because it's mutually exclusive to is_file 2024-09-22 19:09:35 +02:00
Thomas Zahner
5f7b3c42ea Handle path exclusions when walking directories 2024-09-22 19:09:35 +02:00
Thomas Zahner
bd187ba0d9 Update tests 2024-09-22 19:09:35 +02:00
Thomas Zahner
6444e27a84 Make gitignored files configurable and disable by default 2024-09-22 19:09:35 +02:00
Thomas Zahner
41c1b971c7 Replace jwalk with ignore 2024-09-22 19:09:35 +02:00
Thomas Zahner
7fcf66c492
Extend compact format (#1497)
* Show unknowns and timeouts in compact format
* Clippy: make functions const
2024-09-09 18:33:18 +02:00
dependabot[bot]
0a53e920ed
Bump the dependencies group with 6 updates (#1486)
* Bump the dependencies group with 6 updates

Bumps the dependencies group with 6 updates:

| Package | From | To |
| --- | --- | --- |
| [clap](https://github.com/clap-rs/clap) | `4.5.13` | `4.5.15` |
| [serde](https://github.com/serde-rs/serde) | `1.0.204` | `1.0.206` |
| [serde_json](https://github.com/serde-rs/json) | `1.0.122` | `1.0.124` |
| [assert_cmd](https://github.com/assert-rs/assert_cmd) | `2.0.15` | `2.0.16` |
| [tempfile](https://github.com/Stebalien/tempfile) | `3.11.0` | `3.12.0` |
| [html5ever](https://github.com/servo/html5ever) | `0.27.0` | `0.28.0` |


Updates `clap` from 4.5.13 to 4.5.15
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/clap_complete-v4.5.13...v4.5.15)

Updates `serde` from 1.0.204 to 1.0.206
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](https://github.com/serde-rs/serde/compare/v1.0.204...v1.0.206)

Updates `serde_json` from 1.0.122 to 1.0.124
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.122...v1.0.124)

Updates `assert_cmd` from 2.0.15 to 2.0.16
- [Changelog](https://github.com/assert-rs/assert_cmd/blob/master/CHANGELOG.md)
- [Commits](https://github.com/assert-rs/assert_cmd/compare/v2.0.15...v2.0.16)

Updates `tempfile` from 3.11.0 to 3.12.0
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Stebalien/tempfile/commits)

Updates `html5ever` from 0.27.0 to 0.28.0
- [Commits](https://github.com/servo/html5ever/commits)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: assert_cmd
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: dependencies
- dependency-name: tempfile
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
- dependency-name: html5ever
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix compile error

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias@endler.dev>
2024-08-19 12:43:58 +02:00
Matthias Endler
138cd4dd53
test: fail when interpreting md footnote as link (#1479)
* test: fail when interpreting md footnote as link

* Fix footnote link parsing
2024-08-11 12:51:46 +02:00
Brian
8df3b99d8c
Fix: Windows drive paths misidentified as URLs (#1460)
Co-authored-by: Matthias <matthias@endler.dev>
2024-08-06 18:04:13 +02:00
Hugo McNally
4bb8a61545
Updated pulldown-cmark dependency and fixed maths parsing (#1473)
* Update pulldown-cmark version to 0.11.0
* Fix markdown math parsing
* Fix lints
* Disable flaky wayback test

---------

Co-authored-by: Matthias <matthias@endler.dev>
2024-08-06 15:43:34 +02:00
Matthias Endler
dedc554eda
Add response formatter; refactor stats formatter (#1398)
This adds support for formatting responses in different ways.

For now, the options are:

* `plain`: No color, basic formatting
* `color`: Color, indented formatting (default)
* `emoji`: Fancy mode with emoji icons

Fixes #546
Related to #271
2024-06-14 19:47:52 +02:00
Matthias Endler
cc7acfb0e0
Extend documentation for RequestChain (#1442) 2024-06-14 12:51:13 +02:00
Johannes Schindelin
8c6eee9b5f
Add a way to handle "pretty URLs", i.e. URIs without .html extension (#1422)
In many circumstances (GitHub Pages, Apache configured with MultiViews,
etc), web servers process URIs by appending the `.html` file extension
when no file is found at the path specified by the URI but a `.html`
file corresponding to that path _is_ found.

To allow Lychee to use the fast, offline method of checking such files
locally via the `file://` scheme, let's handle this scenario gracefully
by adding the `--fallback-extensions=html` option.

Note: This new option can take a list of file extensions to use; The
first one for which a corresponding file is found is then used.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-06-11 16:11:24 +02:00
Thomas Zahner
255164ce25
Don't trim mailto: prefix when converting a Uri into a string (#1438) 2024-06-10 17:11:15 +02:00
n4n5
c3f7fe7ad4
Exclude tel scheme from being checked (#1429) 2024-05-19 20:31:38 +02:00
Johannes Schindelin
975901d470
Fix clippy errors (#1423)
* Enclose Markdown links in brackets

The current clippy version (v0.1.78) says "you should put bare URLs
between `<`/`>` or make a proper Markdown link" and refers to
https://rust-lang.github.io/rust-clippy/master/index.html#doc_markdown

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

* Enclose documentation item in backticks

Clippy v0.1.78 complains about the IPv6 network mask, insisting that it
is missing backticks. So backticks it gets.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

* Avoid error claiming `Add(usize)` is dead code

Clippy v0.1.78 identifies this as dead code. However, further down in
the same file, there is clearly a user:

  impl Handler<Result, Result> for Add {

This might be yet another incarnation of
https://github.com/rust-lang/rust/issues/56750

Let's just mark it as intentionally dead-code, even if this is untrue,
to make clippy happy again.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

---------

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-05-13 19:07:02 +02:00
John Bampton
0956ec6c38
Fix spelling and remove unneeded trailing whitespace (#1417) 2024-04-26 08:22:44 +02:00
John Bampton
7be088bbfc
Fix spelling; Github -> GitHub (#1416) 2024-04-25 22:44:24 +02:00
Thomas Zahner
25a3eb1a3a
Chain visibility (#1415)
* Make chain public
* Make function on chain public
* Add must_use attribute
* Make RequestChain type public
* Add chain usage example
2024-04-25 15:31:03 +02:00
Matthias Endler
fc85695d21
Gracefully handle invalid URIs (#1414)
With the upgrade to `reqwest` 0.12, we can finally handle a long-standing
issue, when Urls could not be parsed to Uris. Previously, we would panic, but
we can now handle that situation gracefully and return an error instead.

I've also renamed `Status::is_failure` to `Status::is_error`, because the
notion of failures no longer exists in the codebase and we use the term "error"
consistently throughout the codebase instead. This is technically a breaking
change in the API, but it's fine since we have not released a stable version
yet.

More information about the URI parsing issue:
- https://github.com/lycheeverse/lychee/issues/539
- https://github.com/seanmonstar/reqwest/issues/668
2024-04-25 15:29:36 +02:00
Thomas Zahner
e0b4c73987 Adapt to breaking changes & revert to pulldown-cmark 0.9 2024-04-25 08:48:11 +02:00
Thomas Zahner
f2b1c29bd4 Rename chain to handle 2024-04-22 14:07:17 +02:00
Thomas Zahner
ddcca65e72 Rename Chainable to Handler 2024-04-22 11:03:26 +02:00
Thomas Zahner
e3a236b257 Adjust documentation 2024-04-22 11:03:26 +02:00
Matthias
9b4fd8d0fc Extend docs around clone_unwrap 2024-04-22 11:03:26 +02:00
Matthias
9ed97213a1 Add documentation to chain module
Also make `Chainable` and  `ChainResult` public to support external plugins/handlers.
2024-04-22 11:03:26 +02:00
Thomas Zahner
d5b9b84db6 Extract function and add SAFETY note 2024-04-22 11:03:26 +02:00
Thomas Zahner
a3184190b8 Small tweaks & extract method 2024-04-22 11:03:26 +02:00
Thomas Zahner
93afae54bb Create ClientRequestChain helper structure to combine multiple chains 2024-04-22 11:03:26 +02:00
Thomas Zahner
41e7f88da4 Add credentials to chain 2024-04-22 11:03:26 +02:00
Thomas Zahner
3ec3a8228c Make checker part of the request chain 2024-04-22 11:03:26 +02:00
Matthias Endler
0f012f3035 Use async_trait to fix issues with Chain type inference 2024-04-22 11:03:26 +02:00
Thomas Zahner
d92d3ba733 Extract checking functionality & make chain async 2024-04-22 11:03:26 +02:00
Thomas Zahner
17e2911700 Move Arc and Mutex inside of Chain struct 2024-04-22 11:03:26 +02:00
Thomas Zahner
377aceed60 Apply clippy suggestions 2024-04-22 11:03:26 +02:00
Thomas Zahner
c41cd5d6b9 Apply suggestions 2024-04-22 11:03:26 +02:00
Thomas Zahner
1fe6f2f1be Small improvements 2024-04-22 11:03:26 +02:00
Thomas Zahner
a90a35c329 Add doc comment 2024-04-22 11:03:26 +02:00
Thomas Zahner
402482ca01 Update RequestChain & add chain to client 2024-04-22 11:03:26 +02:00
Thomas Zahner
667105e13e Move chain into check_website function 2024-04-22 11:03:26 +02:00
Thomas Zahner
db1dc19c0a Implement Chainable directly for BasicAuthCredentials 2024-04-22 11:03:26 +02:00
Thomas Zahner
8da4592e2a Introduce early exit in chain 2024-04-22 11:03:26 +02:00
Thomas Zahner
7783cdfe46 Pass down request_chain instead of credentials & add test 2024-04-22 11:03:26 +02:00