lychee/README.md

293 lines
15 KiB
Markdown
Raw Normal View History

2020-08-04 23:06:27 +00:00
![lychee](assets/banner.png)
2020-08-07 22:10:30 +00:00
![Rust](https://github.com/hello-rust/lychee/workflows/Rust/badge.svg)
[![docs.rs](https://docs.rs/lychee/badge.svg)](https://docs.rs/lychee)
2020-08-07 22:10:30 +00:00
A fast, async, resource-friendly link checker written in Rust. \
For GitHub links, it can optionally use a [`GITHUB_TOKEN`](#github-token)
to avoid getting blocked by GitHub's rate limiter.
Available as a CLI utility and as a GitHub Action: [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action).
2020-08-04 23:06:27 +00:00
![Lychee demo](./assets/lychee.gif)
2020-08-17 18:14:46 +00:00
2020-10-18 22:09:53 +00:00
## Features
This comparison is made on a best-effort basis. Please create a PR to fix
outdated information.
2020-12-14 23:43:45 +00:00
use
| | lychee | [awesome_bot] | [muffet] | [broken-link-checker] | [linkinator] | [linkchecker] | [markdown-link-check] | [fink] |
| -------------------- | ------- | ------------- | -------- | --------------------- | ------------ | ------------- | --------------------- | ------ |
| Language | Rust | Ruby | Go | JS | TypeScript | Python | JS | PHP |
| Async/Parallel | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] |
2021-01-06 23:37:42 +00:00
| JSON output | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![maybe]<sup>1</sup> | ![yes] | ![yes] |
| Static binary | ![yes] | ![no] | ![yes] | ![no] | ![no] | ![no] | ![no] | ![no] |
| Markdown files | ![yes] | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![no] |
| HTML files | ![yes] | ![no] | ![no] | ![yes] | ![yes] | ![no] | ![yes] | ![no] |
| Text files | ![yes] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] |
| Website support | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | ![no] | ![yes] |
| Chunked encodings | ![yes] | ![maybe] | ![maybe] | ![maybe] | ![maybe] | ![no] | ![yes] | ![yes] |
| GZIP compression | ![yes] | ![maybe] | ![maybe] | ![yes] | ![maybe] | ![yes] | ![maybe] | ![no] |
| Basic Auth | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] | ![no] |
| Custom user agent | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] | ![no] |
| Relative URLs | ![yes] | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] |
| Skip relative URLs | ![yes] | ![no] | ![no] | ![maybe] | ![no] | ![no] | ![no] | ![no] |
| Include patterns | ![yes] | ![yes] | ![no] | ![yes] | ![no] | ![no] | ![no] | ![no] |
| Exclude patterns | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] |
| Handle redirects | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] |
| Ignore insecure SSL | ![yes] | ![yes] | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] |
| File globbing | ![yes] | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] |
| Limit scheme | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] | ![no] |
| [Custom headers] | ![yes] | ![no] | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] |
| Summary | ![yes] | ![yes] | ![yes] | ![maybe] | ![yes] | ![yes] | ![no] | ![yes] |
| `HEAD` requests | ![yes] | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![no] | ![no] |
| Colored output | ![yes] | ![maybe] | ![yes] | ![maybe] | ![yes] | ![yes] | ![no] | ![yes] |
| [Filter status code] | ![yes] | ![yes] | ![no] | ![no] | ![no] | ![no] | ![yes] | ![no] |
| Custom timeout | ![yes] | ![yes] | ![yes] | ![no] | ![yes] | ![yes] | ![no] | ![yes] |
| E-mail links | ![yes] | ![no] | ![no] | ![no] | ![no] | ![yes] | ![no] | ![no] |
| Progress bar | ![yes] | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![yes] |
| Retry and backoff | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] |
| Skip private domains | ![yes] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] |
| [Use as library] | ![yes] | ![yes] | ![no] | ![yes] | ![yes] | ![no] | ![yes] | ![no] |
| Quiet mode | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] |
| Config file | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![yes] | ![no] |
2021-01-06 23:42:39 +00:00
| Recursion | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | ![no] |
| Amazing lychee logo | ![yes] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] |
2020-11-09 11:12:57 +00:00
[awesome_bot]: https://github.com/dkhamsing/awesome_bot
[muffet]: https://github.com/raviqqe/muffet
[broken-link-checker]: https://github.com/stevenvachon/broken-link-checker
[linkinator]: https://github.com/JustinBeckwith/linkinator
[linkchecker]: https://github.com/linkchecker/linkchecker
[markdown-link-check]: https://github.com/tcort/markdown-link-check
[fink]: https://github.com/dantleech/fink
2020-11-09 10:49:32 +00:00
[yes]: ./assets/yes.svg
[no]: ./assets/no.svg
[maybe]: ./assets/maybe.svg
2020-11-09 11:12:57 +00:00
[custom headers]: https://github.com/rust-lang/crates.io/issues/788
[filter status code]: https://github.com/tcort/markdown-link-check/issues/94
[skip private domains]: https://github.com/appscodelabs/liche/blob/a5102b0bf90203b467a4f3b4597d22cd83d94f99/url_checker.go
2020-12-14 23:43:45 +00:00
[use as library]: https://github.com/raviqqe/liche/issues/13
2020-11-09 10:49:32 +00:00
2021-01-06 23:37:42 +00:00
<sup>1</sup> Other machine-readable formats like CSV are supported.
## Contributing to lychee
We'd be thankful for any contribution. \
2020-12-03 14:09:44 +00:00
We try to keep the issue-tracker up-to-date so you can quickly find a task to work on.
Try one of these links to get started:
2020-08-19 23:08:04 +00:00
- [good first issues](https://github.com/lycheeverse/lychee/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
- [help wanted](https://github.com/lycheeverse/lychee/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22)
2020-08-04 23:06:27 +00:00
## Using the Commandline Client
You can run lychee directly from the commandline.
### Installation
#### Using cargo
2020-12-02 23:05:31 +00:00
2021-02-18 10:14:00 +00:00
```sh
2020-08-04 23:06:27 +00:00
cargo install lychee
```
#### Using the official Docker image
2020-12-02 23:05:31 +00:00
2021-02-18 10:14:00 +00:00
```sh
2020-12-02 23:05:31 +00:00
docker pull lycheeverse/lychee
```
2020-12-15 18:17:26 +00:00
#### Using pre-built binaries
2020-12-11 23:28:56 +00:00
We provide binaries for Linux, macOS, and Windows for every release. \
You can download them from the [releases page](https://github.com/lycheeverse/lychee/releases).
## Commandline usage
2020-08-04 23:06:27 +00:00
Run it inside a repository with a `README.md`:
2021-02-18 10:14:00 +00:00
```sh
lychee
```
You can also specify various types of inputs:
2021-02-18 10:14:00 +00:00
```sh
# check links on a website:
lychee https://endler.dev/
# check links in a remote file:
lychee https://raw.githubusercontent.com/lycheeverse/lychee/master/README.md
# check links in local file(s):
lychee README.md
lychee test.html info.txt
# check links in local files (by shell glob):
lychee ~/projects/*/README.md
# check links in local files (lychee supports advanced globbing and ~ expansion):
lychee "~/projects/big_project/**/README.*"
# ignore case when globbing, displaying progress and check result for each link:
lychee --glob-ignore-case --progress --verbose "~/projects/**/[r]eadme.*"
```
2020-08-04 23:06:27 +00:00
### GitHub token
Optionally, to avoid getting rate-limited while checking GitHub links, you can
set an environment variable with your Github token like so `GITHUB_TOKEN=xxxx`,
or use the `--github-token` CLI option. It can also be set in the config file.
The token can be generated in your
[GitHub account settings page](https://github.com/settings/tokens). A personal
token with no extra permissions is enough to be able to check public repos links.
### Commandline Parameters
2020-11-25 09:22:03 +00:00
There is an extensive list of commandline parameters to customize the behavior,
see below for a full list.
2021-02-18 10:14:00 +00:00
```sh
USAGE:
lychee [FLAGS] [OPTIONS] [--] [inputs]...
FLAGS:
-E, --exclude-all-private Exclude all private IPs from checking. Equivalent to `--exclude-private --exclude-link-
local --exclude-loopback`
--exclude-link-local Exclude link-local IP address range from checking
--exclude-loopback Exclude loopback IP address range from checking
--exclude-mail Exclude all mail addresses from checking
--exclude-private Exclude private IP address ranges from checking
--glob-ignore-case Ignore case when expanding filesystem path glob inputs
--help Prints help information
-i, --insecure Proceed for server connections considered insecure (invalid TLS)
-p, --progress Show progress
--skip-missing Skip missing input files (default is to error if they don't exist)
-V, --version Prints version information
-v, --verbose Verbose program output
OPTIONS:
-a, --accept <accept> Comma-separated list of accepted status codes for valid links
-b, --base-url <base-url> Base URL to check relative URLs
--basic-auth <basic-auth> Basic authentication support. E.g. `username:password`
-c, --config <config-file> Configuration file to use [default: ./lychee.toml]
--exclude <exclude>... Exclude URLs from checking (supports regex)
2021-02-17 12:54:41 +00:00
-f, --format <format> Output file format of status report (json, string) [default: string]
--github-token <github-token> GitHub API token to use when checking github.com links, to avoid rate
limiting [env: GITHUB_TOKEN=]
-h, --headers <headers>... Custom request headers
--include <include>... URLs to check (supports regex). Has preference over all excludes
--max-concurrency <max-concurrency> Maximum number of concurrent network requests [default: 128]
-m, --max-redirects <max-redirects> Maximum number of allowed redirects [default: 10]
-X, --method <method> Request method [default: get]
-o, --output <output> Output file of status report
-s, --scheme <scheme> Only test links with the given scheme (e.g. https)
-T, --threads <threads> Number of threads to utilize. Defaults to number of cores available to
the system
-t, --timeout <timeout> Website timeout from connect to response finished [default: 20]
2021-02-17 12:54:41 +00:00
-u, --user-agent <user-agent> User agent [default: lychee/0.6.0]
ARGS:
<inputs>... The inputs (where to get links to check from). These can be: files (e.g. `README.md`), file globs
(e.g. `"~/git/*/README.md"`), remote URLs (e.g. `https://example.com/README.md`) or standard
input (`-`). Prefix with `--` to separate inputs from options that allow multiple arguments
[default: README.md]
```
### Exit codes
- `0` for success (all links checked successfully or excluded/skipped as configured)
- `1` for missing inputs and any unexpected runtime failures or config errors
- `2` for link check failures (if any non-excluded link failed the check)
## Library usage
You can use lychee as a library for your own projects.
Here is a "hello world" example:
```rust
use std::error::Error;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let response = lychee::check("https://github.com/lycheeverse/lychee").await?;
println!("{}", response);
Ok(())
}
```
This is equivalent to the following snippet, in which we build our own client:
```rust
use lychee::{ClientBuilder, Status};
2021-02-15 23:35:59 +00:00
use std::error::Error;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let client = ClientBuilder::default().build()?;
let response = client.check("https://github.com/lycheeverse/lychee").await?;
2021-02-15 23:35:59 +00:00
assert!(matches!(response.status, Status::Ok(_)));
Ok(())
}
```
The client builder is very customizable:
2021-02-18 10:14:00 +00:00
```rust,ignore
let client = lychee::ClientBuilder::default()
.includes(includes)
.excludes(excludes)
.max_redirects(cfg.max_redirects)
.user_agent(cfg.user_agent)
.allow_insecure(cfg.insecure)
.custom_headers(headers)
.method(method)
.timeout(timeout)
.verbose(cfg.verbose)
.github_token(cfg.github_token)
.scheme(cfg.scheme)
.accepted(accepted)
.build()?;
```
All options that you set will be used for all link checks.
See the [builder documentation](https://docs.rs/lychee/latest/lychee/struct.ClientBuilder.html) for all options.
## GitHub Action usage
A GitHub Action that uses lychee is available as a separate repository: [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action)
which includes usage instructions.
2020-11-30 23:32:37 +00:00
## Troubleshooting and workarounds
We collect a list of common workarounds for various websites in our [troubleshooting guide](./TROUBLESHOOTING.md).
2020-11-25 09:22:03 +00:00
## Users
2020-08-04 23:06:27 +00:00
- https://github.com/pawroman/links
2020-12-14 10:38:10 +00:00
- https://github.com/analysis-tools-dev/static-analysis
- https://github.com/analysis-tools-dev/dynamic-analysis
- https://github.com/mre/idiomatic-rust
2020-12-14 23:46:42 +00:00
- https://github.com/lycheeverse/lychee (yes, the lychee docs are checked with lychee 🤯)
If you are using lychee for your project, we'd be delighted to hear about it.
2021-02-02 13:32:19 +00:00
## License
lychee is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or
http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
2020-11-25 09:22:03 +00:00
## Credits
2020-11-09 10:49:32 +00:00
The first prototype of lychee was built in [episode 10 of Hello
Rust](https://hello-rust.show/10/). Thanks to all Github- and Patreon sponsors
for supporting the development since the beginning. Also, thanks to all the
great contributors who have since made this project more mature.