lychee/docs/TROUBLESHOOTING.md
Matthias d61105edbb
Fix parsing error of email addresses with query params (#809)
Email addresses with query parameters often get used in
contact forms on websites. They can also be found in
other documents like Markdown.

A common use-case is to add a subject line to the email
as a parameter e.g. `mailto:mail@example.com?subject="Hello"`.

Previously we handled such cases incorrectly by recognizing
them as files. The reason was that our email parsing was too strict
to allow for that use-case.
With `email_address` we switched to a more permissive parser.

Note that this does not affect the actual address email checking,
as this is still done `check-if-email-exists`, which has more strict
check functionality.
2022-11-05 23:40:33 +01:00

75 lines
3 KiB
Markdown

# Troubleshooting Guide
This document describes common edge-cases and workarounds for checking links to various sites. \
Please add your own findings and send us a pull request if you can.
## GitHub Rate Limiting
GitHub has a quite aggressive rate limiter. \
If you're seeing errors like:
```
GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var.
```
That means you're getting rate-limited. As per the message, you can make lychee \
use a GitHub personal access token to circumvent this.
For more details, see ["GitHub token" section in README.md](https://github.com/lycheeverse/lychee#github-token).
## Too Many Open Files
The number of concurrent network requests (`MAX_CONCURRENCY`) is set to 128 by default.
Every network request maps to an open socket, which is represented as a file on UNIX systems.
If you see error messages like "error trying to connect: tcp open error: Too
many open files (os error 24)" then you ran out of file handles.
You have two options:
1. Lower the concurrency by setting `--max-concurrency` to something more
conservative like 32. This works, but it also comes with a performance
penalty.
2. Increase the number of maximum file handles. See instructions
[here](https://wilsonmar.github.io/maximum-limits/) or
[here](https://synthomat.de/blog/2020/01/increasing-the-file-descriptor-limit-on-macos/).
## Unexpected Status Codes
Some websites don't respond with a `200` (OK) status code. \
Instead they might send `204` (No Content), `206` (Partial Content), or
[something else entirely](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/418).
If you run into such issues you can work around that by providing a custom \
list of accepted status codes, such as `--accept 200,204,206`.
## Website Expects Custom Headers
Some sites expect one or more custom headers to return a valid response. \
For example, crates.io expects a `Accept: text/html` header or else it \
will [return a 404](https://github.com/rust-lang/crates.io/issues/788).
To fix that you can pass additional headers like so: `--headers "accept=text/html"`. \
You can use that argument multiple times to add more headers. \
Or, you can accept all content/MIME types: `--headers "accept=*/*"`.
See more info about the Accept header
[over at MDN](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept).
## Unreachable Mail Address
We use https://github.com/reacherhq/check-if-email-exists for email checking.
You can test your mail address with curl:
```bash
curl -X POST \
'https://api.reacher.email/v0/check_email' \
-H 'content-type: application/json' \
-H 'authorization: test_api_token' \
-d '{"to_email": "box@domain.test"}'
```
Some settings on your mail server (such as `SPF` Policy, `DNSBL`) may prevent
your email from being verified. If you have an error with checking a working
email, you can disable this check using the [commandline
parameter](https://github.com/lycheeverse/lychee#commandline-parameters)
`--exclude-mail`.