Commit graph

156 commits

Author SHA1 Message Date
dependabot[bot]
df115098e3
Bump tabled from 0.10.0 to 0.11.1 (#1039)
* Bump tabled from 0.10.0 to 0.11.1

Bumps [tabled](https://github.com/zhiburt/tabled) from 0.10.0 to 0.11.1.
- [Release notes](https://github.com/zhiburt/tabled/releases)
- [Changelog](https://github.com/zhiburt/tabled/blob/master/CHANGELOG.md)
- [Commits](https://github.com/zhiburt/tabled/commits)

---
updated-dependencies:
- dependency-name: tabled
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update tabled imports

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-04-13 15:17:12 +02:00
Matthias Endler
0e97f57040
Use standard error for error output (#990)
Fixes https://github.com/lycheeverse/lychee/issues/984

From https://doc.rust-lang.org/book/ch12-06-writing-to-stderr-instead-of-stdout.html:

> Command line programs are expected to send error messages to the standard error stream so we can still see error messages on the screen even if we redirect the standard output stream to a file. Our program is not currently well-behaved: we’re about to see that it saves the error message output to a file instead!
2023-04-11 23:43:33 +02:00
Matthias
8f6199b5b6 Don't panic on invalid response URIs 2023-04-11 00:26:43 +02:00
Matthias
649f307028 Avoid unwrap when deserializing statuscode 2023-04-11 00:23:23 +02:00
Thomas
994b2852cd
Wayback integration (#1003)
Adds support for suggesting archived URLs for broken links.
Uses Wayback Machine as the archive provider.
2023-03-28 00:45:06 +02:00
Benny Joe Villiger
250f7a8f0a
Status codes in maps (#1014) 2023-03-27 12:29:12 +02:00
Matthias
cd45f9db07 cleanup empty file 2023-03-18 14:47:21 +01:00
Matthias Endler
30e2a2b62b
Fix --max-redirects (#987)
Having more than the max number of redirects
caused lychee to abort the requests, but did not
lead to an error.

Related: https://github.com/lycheeverse/lychee-action/issues/164
2023-03-10 15:15:37 +01:00
Matthias
9eb3149a69 Custom config handling to spot errors when passing invalid config and ignoring errors loading missing default conf 2023-03-03 12:13:09 +01:00
Matthias
6c133493e9 Revert "Don't ignore file-not-found errors when loading config"
This reverts commit 9ade4502a27cb3776c5fb39cdad7666ab854a373.
2023-03-03 12:13:09 +01:00
Matthias
387766322d Don't ignore file-not-found errors when loading config
This is no longer necessary ever since 712bdfa8cb
2023-03-03 12:13:09 +01:00
Matthias
17937537f8 Ignored URLs don't lead to failing exit code 2023-03-03 12:13:09 +01:00
Matthias
7e0b9e2c68 Update verbosity docs
Thanks to @MichaIng for mentioning the issue and providing a fix.
2023-02-25 15:44:43 +01:00
Matthias Endler
7874195bbb
Customize verbosity (#956) 2023-02-24 23:53:09 +01:00
dependabot[bot]
d8e4940dbe
Bump toml from 0.5.11 to 0.7.0 (#933)
* Bump toml from 0.5.11 to 0.7.0

Bumps [toml](https://github.com/toml-rs/toml) from 0.5.11 to 0.7.0.
- [Release notes](https://github.com/toml-rs/toml/releases)
- [Commits](https://github.com/toml-rs/toml/compare/toml-v0.5.11...toml-v0.7.0)

---
updated-dependencies:
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Introduce new let...else syntax

* Update config file loading for latest toml crate version

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
Co-authored-by: Matthias Endler <matthias@endler.dev>
2023-01-30 15:12:34 +01:00
Lucius Hu
e2406089ad
chore!: improve client and remap modules (#913)
`lychee_lib::client`:

- Improved documentation.
- Added an log message in `ClientBuilder::client()` when provied user-agent
  overrides the one defined in provied custom header.
- Removed unnecessary error handling in `Client::check()` when setting HTTPS
  scheme because all failure cases should occur when checking this URL the first
  time already.
- Removed unnecessary error handling in `Client::remap()` since
  `lychee-lib::remap::Remaps::remap()` doesn't returns a `Result` anymore.
- Fixed potential integer overflow in `Client::check_website()` when the wait
  time between retries doubles, by using `std::time::Duration::saturating_mul`
  instead.
- Renamed `invalid()` to `validate_url()`.

`lychee_lib::remap`:

- Improved documentation, in particular, clarified (in the comment) that it's
  URLs not URIs being remapped.
- Changed `Remaps::remap()` so it takes `&mut Url` instead of `Uri` as its
  argument, and doesn't return a `Result` as a result.
    - Using `Url` instead of `Uri` because it aligns with the concept of
      remapping locations rather than identifiers.
    - Mutating the URL directly instead of returning a new one for it's more
      straightforward.
    - There is no error handling because we don't convert from URL to URI
      anymore. Furthermore, this always succeed in the first place so we never
      needed error handling.
- Added implementation of `IntoIterator` for `&'a Remaps` and convenience method
  of `Remaps::iter`. (Their mutable or moving counterparts are deliberately
  avoided because we don't want library users to modify all consume the
  remapping rules after its instantiation.)

`lychee_lib::error`:

- Renamed `ErrorKind::InvalidUriRemap` to `InvalidUrlRemap` and improved
  its error message.

Changes to other modules are minor and only serves to accompany aforementioned
changes.
2023-01-16 19:14:09 +01:00
Matthias Endler
15d8024c7c
Change progress bar style (#718)
* Bump indicatif from 0.16.2 to 0.17.0

Bumps [indicatif](https://github.com/console-rs/indicatif) from 0.16.2 to 0.17.0.
- [Release notes](https://github.com/console-rs/indicatif/releases)
- [Commits](https://github.com/console-rs/indicatif/compare/0.16.2...0.17.0)

---
updated-dependencies:
- dependency-name: indicatif
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update progress bar setup

* Change progress bar style

* Use pink for spinner
* Show ETA instead of elapsed
* dim progress bar and adjust size to terminal width

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-22 15:29:45 +01:00
Matthias Endler
da46734c54
Extend response stats in verbose mode (#882) 2022-12-20 10:43:01 +01:00
Matthias Endler
6df1c378ec
Fix Rust 1.66 clippy lints (#879) 2022-12-19 14:28:10 +01:00
Matthias
96dec6984a
Refactor check function (#860) 2022-12-12 01:05:47 +01:00
Matthias
e476965bee
Fix verbosity serialization (#853)
Forgot the serde defaults which lead to problems on some terminals
2022-11-29 12:59:32 +01:00
Matthias
93a1481305
Less verbose cache age formatting (#849)
Previously the cache age was formatted with nanosecond resolution,
which is too fine-grained even for Rustaceans.
Now the format is limited to days, hours, minutes, and seconds.
With that, the cache age becomes more easily parseable by humans.
2022-11-29 00:39:49 +01:00
Matthias
982d978e47
Add different verbosity levels (#824)
More granular verbosity levels have been asked
for repeatedly.
To enable that we're moving to [env_logger] and [clap-verbosity-flag]
to provide more flexible verbosity settings.

Also tackles #661, #709
Lays the groundwork for tackling #268

https://github.com/rust-cli/env_logger
https://github.com/clap-rs/clap-verbosity-flag
2022-11-28 23:25:33 +01:00
Matthias
b479a5810e
Allow overriding accepted status codes for cached URIs (#843)
Fixes #840
2022-11-28 12:23:07 +01:00
dependabot[bot]
2ce1a9ae06
Bump clap from 3.2.23 to 4.0.22 (#813)
* Bump clap from 3.2.23 to 4.0.22

Bumps [clap](https://github.com/clap-rs/clap) from 3.2.23 to 4.0.22.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.2.23...v4.0.22)

* The `headers` option got renamed to `header` to align with the rest
   of the options, which are singular.
* The short option for `header` (`-h`) was removed to avoid a conflict with
  help (`lychee -h`).
* Update and simplify readme check

Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-11-13 21:10:32 +01:00
Matthias
35ccfb87c3
Add support for dumping links to file (#810) 2022-11-08 00:33:16 +01:00
Matthias
264af23822 Improve wording 2022-11-05 17:25:44 +01:00
Andy Grunwald
a67b513238
Extend description of "--exclude" to also exclude email addresses, not only URLs (#801) 2022-10-23 12:17:20 +02:00
Matthias
cbd936960a
Move from structopt to clap (#732)
Structopt was subsumed by clap. See
https://github.com/clap-rs/clap/blob/master/CHANGELOG.md#migrating
2022-08-12 22:53:13 +02:00
Matthias
69f387c1bd
Markdown-status (#729)
* Fix typos

* Add status code description to markdown output
2022-08-11 22:08:05 +02:00
tooomm
092b8b0bf1
reorder md output (#708) 2022-08-04 00:48:45 +02:00
dependabot[bot]
960e32c55f
Bump tabled from 0.7.0 to 0.8.0 (#701)
* Bump tabled from 0.7.0 to 0.8.0

Bumps [tabled](https://github.com/zhiburt/tabled) from 0.7.0 to 0.8.0.
- [Release notes](https://github.com/zhiburt/tabled/releases)
- [Changelog](https://github.com/zhiburt/tabled/blob/master/CHANGELOG.md)
- [Commits](https://github.com/zhiburt/tabled/compare/v0.7.0...v0.8.0)

---
updated-dependencies:
- dependency-name: tabled
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update tabled formatting and tests

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-08-03 23:22:08 +02:00
dependabot[bot]
7c1b2f7527
Bump indicatif from 0.16.2 to 0.17.0 (#711)
* Bump indicatif from 0.16.2 to 0.17.0

Bumps [indicatif](https://github.com/console-rs/indicatif) from 0.16.2 to 0.17.0.
- [Release notes](https://github.com/console-rs/indicatif/releases)
- [Commits](https://github.com/console-rs/indicatif/compare/0.16.2...0.17.0)

---
updated-dependencies:
- dependency-name: indicatif
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update progress bar setup

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-08-03 14:20:25 +02:00
Matthias
6fae93f2da
Skip caching unsupported and excluded URLs (#692)
As discussed in https://github.com/lycheeverse/lychee/issues/647#issuecomment-1170773449, it does not make much sense to cache unsupported
and excluded URLs.
Unsupported URLs might be supported in the future and caching them
would mean they won't get checked then. Excluded URLs were
excluded for a reason and should not appear in the cache.
Furthermore they might not be excluded
in a consecutive run, leading to a false-positive.
2022-07-17 18:40:45 +02:00
Walter Beller-Morales
75a3da0b7e
Add status code in Markdown output (#677) 2022-07-05 14:43:15 +02:00
Matthias
78185d3b63 Add documentation 2022-06-21 10:03:31 +02:00
Matthias
84de43c554
Refactor request types (#637) 2022-06-03 20:13:07 +02:00
Matthias
a557cba0b4
Add support for parsing list of status codes from config file (#636) 2022-06-02 18:53:04 +02:00
Matthias
9b4dfadffd
Fix parsing errors with config options (#632) 2022-05-31 19:43:46 +02:00
vpereira01
d48a3279a8
Improve configuration example (#631)
* Add missing parameters
* Remove deprecated `--exclude-file` parameter
* Improve TOML comments
* Add config smoketest
2022-05-31 19:05:27 +02:00
Matthias
b40aacd459
Prepare for release v0.10.0 (#629) 2022-05-30 23:02:18 +02:00
Matthias
22fecfc056
Add support for URI remapping (#620)
Remaps allow mapping from a URI pattern to a different URI.

The syntax is

```
lychee --remap 'https://example.com http://127.0.0.1'
```

Some use-cases are
- Testing URIs prior to production deployment
- Testing URIs behind a proxy

Be careful when using this feature because checking every link against a
large set of regular expressions has a performance impact. Also there are no
constraints on the URI mapping, so the rules might contradict with each
other.
Remap rules get applied in order of definition to every input URI.
2022-05-29 21:41:22 +02:00
Matthias
363b95fe5f
Add support for excluding paths from link checking (#623)
This change deprecates `--exclude-file` as it was ambiguous.
Instead, `--exclude-path` was introduced to support excluding paths
to files and directories that should not be checked.
Furthermore, `.lycheeignore` is now the only way
to exclude URL patterns.
2022-05-29 17:27:09 +02:00
Matthias
b40c785b64
Also dump excluded links (#615)
This is a minimally invasive version, which allows to grep for `[excluded]`.
The reason for exclusion would require more work and it's debatable if
it adds any value, because it might make grepping harder and the source
of exclusion is easily deducatable from the commandline parameters
or the `.lycheeignore` file.

Fixes #587.
2022-05-13 18:53:16 +02:00
Matthias
b0136683a9
Add support for comments in .lycheeignore (#616)
Lines starting with the comment character (`#`) inside the
.lycheeignore file will be ignored.
Whitespace at the beginning of each line will be ignored, so
even an indented comment character will work.
2022-05-13 18:51:58 +02:00
Matthias
8c0a32d81d
Refactor response formatting (#599)
* Add support for raw formatter (no color)
* Introduce ResponseFormatter trait
* Pass the same params to every cli command
* Update dependencies
* Remove pretty_assertions dependency (latest version doesn't build)
2022-04-25 19:19:36 +02:00
dependabot[bot]
0d6f84217f
Bump tabled from 0.5.0 to 0.6.0 (#583)
* Bump tabled from 0.5.0 to 0.6.0

Bumps [tabled](https://github.com/zhiburt/tabled) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/zhiburt/tabled/releases)
- [Changelog](https://github.com/zhiburt/tabled/blob/master/CHANGELOG.md)
- [Commits](https://github.com/zhiburt/tabled/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: tabled
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* #[field] #[header] in Tabled macro was renamed to #[tabled].

* Fix tabled rename field

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-04-06 01:02:12 +02:00
MichaIng
b338ba2abc
Enhance verbosity check (#578)
as suggested here: https://github.com/lycheeverse/lychee/pull/570#discussion_r835931903

Signed-off-by: MichaIng <micha@dietpi.com>
2022-04-04 10:31:30 +02:00
Matthias
36d3195c68
Cache verbosity issue (fixes #562) 2022-03-27 14:48:09 +02:00
Matthias
743d386252
Allow input URLs without scheme (fixes #567)
This requires `Input::new` to return a `Result`, because the URL
parsing could fail when prepending `http://`.

We use http instead of https, because curl does as well:
70ac27604a/lib/urlapi.c (L1104-L1124)
Missing files will be interpreted as URLs from the command line
and these can be invalid, but that's not seen as an error anymore.
2022-03-27 01:27:27 +01:00
Matthias
d616177a99
Implement excluding code blocks (#523)
This is done in the extractor to avoid unnecessary
allocations.
2022-03-26 10:42:56 +01:00
Matthias
e1d112dbab
Remove missing_panic_doc (#561) 2022-03-22 21:02:56 +01:00
Matthias
8097bfa408
Print Github token error once at the end (#537)
Print original reqwest error for every Github link.
It contains more information about the underlying error.

Only print a message about the Github token at the
end if it's not set and there were Github errors.
2022-03-03 10:04:55 +01:00
Matthias
4c51fce22f
Fix broken pipe error on failing writes to stdout (#535)
Make sure that broken pipes (e.g. when a reader of a
pipe prematurely exits during execution) get handled gracefully.
This change also moves some error messages to stderr by using
eprintln.

More info: https://github.com/jez/as-tree/issues/15
2022-03-02 23:39:54 +01:00
Matthias
05bd3817ee
Make retry wait time configurable (#525) 2022-02-24 12:24:57 +01:00
Matthias
41b291037a
Response output overhaul (#524)
Clean up the response output.
Superfluous information was removed and the formatting was changed to make
the output more readable to humans.
2022-02-23 17:28:14 +01:00
dependabot[bot]
c4e004bdf8
Bump tabled from 0.4.2 to 0.5.0 (#505)
* Bump tabled from 0.4.2 to 0.5.0

Bumps [tabled](https://github.com/zhiburt/tabled) from 0.4.2 to 0.5.0.
- [Release notes](https://github.com/zhiburt/tabled/releases)
- [Changelog](https://github.com/zhiburt/tabled/blob/master/CHANGELOG.md)
- [Commits](https://github.com/zhiburt/tabled/compare/v0.4.2...v0.5.0)

---
updated-dependencies:
- dependency-name: tabled
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update `tabled` format; add test

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-02-19 02:23:38 +01:00
Matthias
ba276cd51b
Error cleanup (#510)
* Add more fine-grained error types; remove generic IO error
* Update error message for missing file
* Remove missing `Error` suffix
* Rename ErrorKind::Github to ErrorKind::GithubRequest for consistency with NetworkRequest
2022-02-19 01:44:00 +01:00
Matthias
812663d832
Prevent flaky tests (#514)
Move from example.org to example.com, which seems to be more permissive for testing
2022-02-18 10:29:49 +01:00
Lucius Hu
6d56c6b55c
Replace plain String with SecretString for GitHub token (#509)
This commit changed the type of `lychee-lib::ClientBuilder::github_token` from
`String` to `secrecy::SecretString` to fortify the secret management within our
program.

Note that this won't affect TOML configuration of `lychee-bin` because
`serde::Deserialize` is still implemented for `SecretString`.
2022-02-13 13:53:46 +01:00
Matthias
47df7780fe
Use captured identifiers in format strings (#507)
Makes for arguably cleaner-looking code.
The downside is that the MSRV is 1.58
https://blog.rust-lang.org/2022/01/13/Rust-1.58.0.html

Given that nobody uses lychee as a library yet
and we have precompiled binaries, it's an acceptable
tradeoff.
My little research revealed that this is a much-liked
feature: https://twitter.com/matthiasendler/status/1483895557621960715
2022-02-12 10:51:52 +01:00
Matthias
9d738fb3f5
Fix default config (#491)
The default configuration was broken since the
introduction of caching and specifically `max_cache_age`.
This fixes deserialization and config merging for
the case where this key is missing from the config.
2022-02-07 23:17:50 +01:00
Markus Unterwaditzer
d8305f7f53
fix constant updating of progressbar (#488)
* fix constant updating of progressbar

In other issues I've already lamented how slow lychee is when used
without `-n`. This fixes an issue where without `-n`, lychee would take
1 minute instead of 4 seconds to check sentry-docs.

* fix values
2022-02-07 23:15:26 +01:00
Markus Unterwaditzer
68d09f7e5b
Add html5gum as alternative link extractor (#480)
html5gum is a HTML parser that offers lower-level control over which tokens actually get created and are tracked. As such, the extractor doesn't allocate anything tokens it doesn't care about. On some benchmarks it provides a substantial performance boost. The old parser, html5ever is still available by setting the `LYCHEE_USE_HTML5EVER=1` env var.
2022-02-07 22:54:47 +01:00
Lucius Hu
6bf8c1fe39
lychee-bin: replace lazy_static by const_format (#495)
This commit replaced the use of `lazy_static` by
`const_format` in `lychee-bin`.

Currently `lazy_static` is used to generate static
String at runtime. With `const_format` we can instead
make constant String at compile time.

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2022-02-07 22:45:17 +01:00
Matthias
4630216c30 Add description for max-cache-age flag 2022-01-14 16:55:56 +01:00
Matthias
ac490f9c53
Add caching functionality (v2) (#443)
A while ago, caching was removed due to some issues (see #349).
This is a new implementation with the following improvements:

 * Architecture: The new implementation is decoupled from the collector, which was a major issue in the last version.    Now the collector has a single responsibility: collecting links. This also avoids race-conditions when running multiple collect_links instances, which probably was an issue before.
* Performance: Uses DashMap under the hood, which was noticeably faster than Mutex<HashMap> in my tests.
* Simplicity: The cache format is a CSV file with two columns: URI and status. I decided to create a new struct called CacheStatus for serialization, because trying to serialize the error kinds in Status turned out to be a bit of a nightmare and at this point I don't think it's worth the pain (and probably isn't idiomatic either).

This is an optional feature. Caching only gets used if the `--cache` flag is set.
2022-01-14 15:25:51 +01:00
Matthias
36450621fa
Update dependencies (#454) 2022-01-10 22:35:37 +01:00
dependabot[bot]
54b5be81c2
Bump tabled from 0.3.0 to 0.4.2 (#447)
* Bump tabled from 0.3.0 to 0.4.2

Bumps [tabled](https://github.com/zhiburt/tabled) from 0.3.0 to 0.4.2.
- [Release notes](https://github.com/zhiburt/tabled/releases)
- [Changelog](https://github.com/zhiburt/tabled/blob/master/CHANGELOG.md)
- [Commits](https://github.com/zhiburt/tabled/compare/v0.3.0...v0.4.2)

---
updated-dependencies:
- dependency-name: tabled
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2022-01-07 23:10:39 +01:00
Matthias
21f3160b71
Make retries configurable; align constants (#446)
Using the same default values for the library and the
binary now but tweaked the values a bit for slightly faster performance.
2022-01-07 01:03:10 +01:00
Matthias
5eb062cbec Always hide GH token in opts 2022-01-06 09:54:03 +01:00
Matthias
01393b34a2
Upgrade to Rust 2021 (#427) 2021-12-17 01:32:13 +01:00
Matthias
166c86c30e
Use tokenizer for extraction; add benchmark (#424)
This avoids creating a DOM tree for link extraction and instead uses a `TokenSink` for on-the-fly extraction. In hyperfine benchmarks it was about 10-25% faster than the master.

Old: 4.557 s ± 0.404 s
New: 3.832 s ± 0.131 s

The performance fluctuates a little less as well.

Some missing element/attribute pairs were also added, which contain links according to the HTML spec. These occur very rarely, but it's good to parse them for completeness' sake.

Furthermore tried to clean up a lot of papercuts around our types. We now differentiate between a `RawUri` (stringy-types) and a Uri, which is a properly parsed `URI` type.
The extractor now only deals with extracting `RawUri`s while the collector creates the request objects.
2021-12-16 18:45:52 +01:00
Matthias
c41ba64a69
Max concurrency moved to check (#419)
Concurrency is defined by the channel size consuming
from the request stream in  `check`
2021-12-07 11:52:40 +01:00
Matthias
3d5135668b
Improve concurrency with streams (#330)
* Move to from vec to streams

Previously we collected all inputs in one vector
before checking the links, which is not ideal.
Especially when reading many inputs (e.g. by using a glob pattern),
this could cause issues like running out of file handles.

By moving to streams we avoid that scenario. This is also the first
step towards improving performance for many inputs.

To stay as close to the pre-stream behaviour, we want to stop processing
as soon as an Err value appears in the stream. This is easiest when the
stream is consumed in the main thread.
Previously, the stream was consumed in a tokio task and the main thread
waited for responses.
Now, a tokio task waits for responses (and displays them/registers
response stats) and the main thread sends links to the ClientPool.
To ensure that the main thread waits for all responses to have arrived
before finishing the ProgressBar and printing the stats, it waits for
the show_results_task to finish.


* Return collected links as Stream
* Initialize ProgressBar without length because we can't know the amount of links without blocking
* Handle stream results in main thread, not in task
* Add basic directory support using jwalk
* Add test for HTTP protocol file type (http://)
* Remove deadpool (once again): Replaced with `futures::StreamExt::for_each_concurrent`.
* Refactor main; fix tests
* Move commands into separate submodule
* Simplify input handling
* Simplify collector
* Remove unnecessary unwrap
* Simplify main
* cleanup check
* clean up dump command
* Handle requests in parallel 
* Fix formatting and lints

Co-authored-by: Timo Freiberg <self@timofreiberg.com>
2021-12-01 18:25:11 +01:00
Matthias
591cbdbebb
Add support for .lycheeignore file #308 (#402)
This is similar to files like .gitignore and .dockerignore
and gets merged into exclude_files
2021-11-23 01:39:53 +01:00
Matthias
1eb4453957
Only print source in verbose mode (#400)
This way the normal link output can be fed into
another tool without data mangling.
2021-11-21 17:22:04 +01:00
Matthias
4008c2ce38 Add missing newline 2021-11-18 00:46:20 +01:00
Matthias
b97fda34d0
Add support for different output formats (compact, detailed, markdown) (#375) 2021-11-18 00:44:48 +01:00
Derek Croote
e8bab82d76
Fix clippy lint (#383) 2021-11-05 10:22:51 +01:00
Matthias
56726f41fc
Add back connection pool (#355) 2021-10-08 13:08:44 +02:00
MichaIng
961f12e58e
Remove cache from collector and remove custom reqwest client pool
* Reqwest comes with its own request pool, so there's no need in adding
another layer of indirection. This also gets rid of a lot of allocs.
* Remove cache from collector
* Improve error handling and documentation
* Add back test for request caching in single file

Signed-off-by: MichaIng <micha@dietpi.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2021-10-07 18:07:18 +02:00
MichaIng
b648b5e914
Imply "localhost" when loopback IPs are excluded (#351)
as "localhost" is usually mapped via "hosts" file to a loopback IP address.

Resolves: https://github.com/lycheeverse/lychee/issues/319

Signed-off-by: MichaIng <micha@dietpi.com>
2021-10-06 11:33:23 +02:00
Matthias
251332efe2
Cache absolute_path to decrease allocations (#346)
* Cache `absolute_path` to decrease allocations

While profiling local file handling, I noticed that resolving paths was taking a
significant amount of time. It also caused quite a few allocations.
By caching the path and using a constant value for the current
directory, we can reduce the number of allocs by quite a lot.
For example, when testing on the sentry documentation, we do 50,4%
less allocations in total now. That's just a single test-case of course,
but it's probably also helping in many other cases as well.

* Defer to_string for attr.value to reduce allocs
* Use Tendrils instead of Strings for parsing (another ~1.5% less allocs)
* Move option parsing code into separate module
* Handle base dir more correctly
* Temporarily disable dry run
2021-10-05 01:37:43 +02:00
Matthias
f2d7abbc29
Fix broken pipe when dumping links (#339)
When piping the output of lychee's `--dump` output to another program,
we can run into issues with broken pipes as described in
https://github.com/rust-lang/rust/issues/46016
and https://gabebw.com/blog/2019/10/13/learning-rust-by-candlelight
To avoid this, we use the underlying writeln macro and check the
returned `ErrorKind`.
2021-09-20 12:12:35 +02:00
Matthias
712bdfa8cb
Make inputs required (show help if not provided) (#329) 2021-09-16 16:40:38 +02:00
Matthias
21ea0fd033
Add support for tokio-console (#318)
This allows troubleshooting and improving async Rust code.
It is an optional feature that is still
experimental (but can be quite helpful)
2021-09-12 18:10:23 +02:00
Matthias
a1acf7b0d0 Reintegrate master 2021-09-09 01:49:25 +02:00
Matthias
f3fe46a4d6 Merge branch 'master' of github.com:lycheeverse/lychee into local-files 2021-09-08 00:35:41 +02:00
Paweł Romanowski
8fd34a7367
Add no check (dump links only) flag (#99) 2021-09-06 16:10:48 +02:00
Matthias
87fd90f2fc cargo fmt 2021-09-06 15:20:18 +02:00
Matthias
dd3205a87c wip 2021-09-06 15:19:43 +02:00
Matthias
bfa3b1b6a1 Introduce Base type, which can be a path or URL 2021-09-06 15:15:40 +02:00
Matthias
f9bf52ef10 Add support for base_dir 2021-09-06 15:15:05 +02:00
Matthias Endler
701fbc9ada Add support for local files 2021-09-06 15:14:33 +02:00
Lucius Hu
80b8a856ac
Add new flag --require-https (#195) 2021-09-04 03:21:54 +02:00
Matthias
4b537763a5 Directly connect into Result 2021-09-03 16:29:57 +02:00
Matthias
59abd189cf Fix remaining clippy lints 2021-09-03 16:29:57 +02:00
Matthias
d959b54b56 run cargo fmt 2021-09-03 16:29:57 +02:00
dblock
dcee4a1058 Added support for --exclude-file. 2021-09-03 16:29:57 +02:00