Commit graph

43 commits

Author SHA1 Message Date
Matthias
35ccfb87c3
Add support for dumping links to file (#810) 2022-11-08 00:33:16 +01:00
Matthias
d61105edbb
Fix parsing error of email addresses with query params (#809)
Email addresses with query parameters often get used in
contact forms on websites. They can also be found in
other documents like Markdown.

A common use-case is to add a subject line to the email
as a parameter e.g. `mailto:mail@example.com?subject="Hello"`.

Previously we handled such cases incorrectly by recognizing
them as files. The reason was that our email parsing was too strict
to allow for that use-case.
With `email_address` we switched to a more permissive parser.

Note that this does not affect the actual address email checking,
as this is still done `check-if-email-exists`, which has more strict
check functionality.
2022-11-05 23:40:33 +01:00
Matthias
a42ad4c673
Twitter quirk fixed; adjust test (#741) 2022-08-17 16:52:20 +02:00
Walter Beller-Morales
6d40a2ab7b
Update to gracefully handle nonexistent relative paths (#691)
* Update Input::new to gracefully handle nonexistent relative paths
* Add test checking Input::new can handle real relative paths
* Add better pre-conditions to Input::new tests
* Add integration tests for handling relative paths in lychee-bin
* Update lychee-lib/src/types/input.rs
2022-07-22 17:15:55 +02:00
Matthias
6fae93f2da
Skip caching unsupported and excluded URLs (#692)
As discussed in https://github.com/lycheeverse/lychee/issues/647#issuecomment-1170773449, it does not make much sense to cache unsupported
and excluded URLs.
Unsupported URLs might be supported in the future and caching them
would mean they won't get checked then. Excluded URLs were
excluded for a reason and should not appear in the cache.
Furthermore they might not be excluded
in a consecutive run, leading to a false-positive.
2022-07-17 18:40:45 +02:00
Walter Beller-Morales
9ad53f97a2
Fix deserialize of lycheecache status codes (#685)
* Add custom deserializer for `CacheStatus` to properly classify status codes
* Add CLI integration tests to check .lycheecache behavior
* Add comment to explain conflict between cache and accept flags
2022-07-15 22:45:24 +02:00
vpereira01
d48a3279a8
Improve configuration example (#631)
* Add missing parameters
* Remove deprecated `--exclude-file` parameter
* Improve TOML comments
* Add config smoketest
2022-05-31 19:05:27 +02:00
Matthias
22fecfc056
Add support for URI remapping (#620)
Remaps allow mapping from a URI pattern to a different URI.

The syntax is

```
lychee --remap 'https://example.com http://127.0.0.1'
```

Some use-cases are
- Testing URIs prior to production deployment
- Testing URIs behind a proxy

Be careful when using this feature because checking every link against a
large set of regular expressions has a performance impact. Also there are no
constraints on the URI mapping, so the rules might contradict with each
other.
Remap rules get applied in order of definition to every input URI.
2022-05-29 21:41:22 +02:00
Matthias
363b95fe5f
Add support for excluding paths from link checking (#623)
This change deprecates `--exclude-file` as it was ambiguous.
Instead, `--exclude-path` was introduced to support excluding paths
to files and directories that should not be checked.
Furthermore, `.lycheeignore` is now the only way
to exclude URL patterns.
2022-05-29 17:27:09 +02:00
Matthias
b40c785b64
Also dump excluded links (#615)
This is a minimally invasive version, which allows to grep for `[excluded]`.
The reason for exclusion would require more work and it's debatable if
it adds any value, because it might make grepping harder and the source
of exclusion is easily deducatable from the commandline parameters
or the `.lycheeignore` file.

Fixes #587.
2022-05-13 18:53:16 +02:00
Matthias
b0136683a9
Add support for comments in .lycheeignore (#616)
Lines starting with the comment character (`#`) inside the
.lycheeignore file will be ignored.
Whitespace at the beginning of each line will be ignored, so
even an indented comment character will work.
2022-05-13 18:51:58 +02:00
Matthias
8c0a32d81d
Refactor response formatting (#599)
* Add support for raw formatter (no color)
* Introduce ResponseFormatter trait
* Pass the same params to every cli command
* Update dependencies
* Remove pretty_assertions dependency (latest version doesn't build)
2022-04-25 19:19:36 +02:00
Matthias
743d386252
Allow input URLs without scheme (fixes #567)
This requires `Input::new` to return a `Result`, because the URL
parsing could fail when prepending `http://`.

We use http instead of https, because curl does as well:
70ac27604a/lib/urlapi.c (L1104-L1124)
Missing files will be interpreted as URLs from the command line
and these can be invalid, but that's not seen as an error anymore.
2022-03-27 01:27:27 +01:00
Matthias
d616177a99
Implement excluding code blocks (#523)
This is done in the extractor to avoid unnecessary
allocations.
2022-03-26 10:42:56 +01:00
Matthias
8097bfa408
Print Github token error once at the end (#537)
Print original reqwest error for every Github link.
It contains more information about the underlying error.

Only print a message about the Github token at the
end if it's not set and there were Github errors.
2022-03-03 10:04:55 +01:00
Matthias
0fc5fc9ffe
Print errors with a different format for easier clickability (fixes #532) 2022-03-01 16:58:04 +01:00
Matthias
ba276cd51b
Error cleanup (#510)
* Add more fine-grained error types; remove generic IO error
* Update error message for missing file
* Remove missing `Error` suffix
* Rename ErrorKind::Github to ErrorKind::GithubRequest for consistency with NetworkRequest
2022-02-19 01:44:00 +01:00
Matthias
812663d832
Prevent flaky tests (#514)
Move from example.org to example.com, which seems to be more permissive for testing
2022-02-18 10:29:49 +01:00
Matthias
47df7780fe
Use captured identifiers in format strings (#507)
Makes for arguably cleaner-looking code.
The downside is that the MSRV is 1.58
https://blog.rust-lang.org/2022/01/13/Rust-1.58.0.html

Given that nobody uses lychee as a library yet
and we have precompiled binaries, it's an acceptable
tradeoff.
My little research revealed that this is a much-liked
feature: https://twitter.com/matthiasendler/status/1483895557621960715
2022-02-12 10:51:52 +01:00
Matthias
9d738fb3f5
Fix default config (#491)
The default configuration was broken since the
introduction of caching and specifically `max_cache_age`.
This fixes deserialization and config merging for
the case where this key is missing from the config.
2022-02-07 23:17:50 +01:00
Matthias
ac490f9c53
Add caching functionality (v2) (#443)
A while ago, caching was removed due to some issues (see #349).
This is a new implementation with the following improvements:

 * Architecture: The new implementation is decoupled from the collector, which was a major issue in the last version.    Now the collector has a single responsibility: collecting links. This also avoids race-conditions when running multiple collect_links instances, which probably was an issue before.
* Performance: Uses DashMap under the hood, which was noticeably faster than Mutex<HashMap> in my tests.
* Simplicity: The cache format is a CSV file with two columns: URI and status. I decided to create a new struct called CacheStatus for serialization, because trying to serialize the error kinds in Status turned out to be a bit of a nightmare and at this point I don't think it's worth the pain (and probably isn't idiomatic either).

This is an optional feature. Caching only gets used if the `--cache` flag is set.
2022-01-14 15:25:51 +01:00
Matthias
21f3160b71
Make retries configurable; align constants (#446)
Using the same default values for the library and the
binary now but tweaked the values a bit for slightly faster performance.
2022-01-07 01:03:10 +01:00
Matthias
8df50cf501 Stop testing Twitter quirk (see #448) 2022-01-07 00:15:55 +01:00
Matthias
591cbdbebb
Add support for .lycheeignore file #308 (#402)
This is similar to files like .gitignore and .dockerignore
and gets merged into exclude_files
2021-11-23 01:39:53 +01:00
Matthias
d96c1269ff
Use thiserror for error handling (#399)
This removes some boilerplate and is arguably better
than handwriting the error handling code for
maintainability and avoid inconsitent functionality
for the error variants.
thiserror is also the de-facto standard for library
error types as of today.
2021-11-20 01:42:50 +01:00
Matthias
b97fda34d0
Add support for different output formats (compact, detailed, markdown) (#375) 2021-11-18 00:44:48 +01:00
MichaIng
961f12e58e
Remove cache from collector and remove custom reqwest client pool
* Reqwest comes with its own request pool, so there's no need in adding
another layer of indirection. This also gets rid of a lot of allocs.
* Remove cache from collector
* Improve error handling and documentation
* Add back test for request caching in single file

Signed-off-by: MichaIng <micha@dietpi.com>
Co-authored-by: Matthias <matthias-endler@gmx.net>
2021-10-07 18:07:18 +02:00
Matthias
3b41c4c375
Silently ignore absolute paths without base (fixes #320) (#338) 2021-09-20 11:13:30 +02:00
Matthias
a1acf7b0d0 Reintegrate master 2021-09-09 01:49:25 +02:00
Matthias
93948d7367 Avoid double-encoding already encoded destination paths
E.g. `web%20site` becomes `web site`.
That's because Url::from_file_path will encode the full URL in the end.
This behavior cannot be configured.
See https://github.com/lycheeverse/lychee/pull/262#issuecomment-915245411
2021-09-09 01:44:10 +02:00
Matthias
a75cae54b1 Add failing test 2021-09-09 01:17:56 +02:00
Matthias
24ea2482d3 Update docs 2021-09-08 01:08:59 +02:00
Matthias
a28f932fb2 Fix wildcard test 2021-09-07 00:41:07 +02:00
Matthias
82652a69d5 Add test 2021-09-06 15:20:18 +02:00
Matthias
daa5be4c3a Add/change file link tests 2021-09-06 15:19:09 +02:00
Lucius Hu
80b8a856ac
Add new flag --require-https (#195) 2021-09-04 03:21:54 +02:00
Matthias
d959b54b56 run cargo fmt 2021-09-03 16:29:57 +02:00
dblock
dcee4a1058 Added support for --exclude-file. 2021-09-03 16:29:57 +02:00
dblock
739a3d6e41 Fix: remove URL that is currently returning a 503. 2021-09-03 16:29:57 +02:00
Matthias
fe399c0a8c
Simple URI cache (#243) 2021-05-04 13:28:39 +02:00
Matthias
164e1aea7e
Add support for multiple schemes (#237) 2021-04-26 18:24:54 +02:00
Matthias
f8426bafbf
Skip unsupported schemes (#236) 2021-04-26 17:16:58 +02:00
Lucius Hu
228e5df6a3
Major refactor of codebase (#208)
- The binary component and library component are separated as two
  packages in the same workspace.
  - `lychee` is the binary component, in `lychee-bin/*`.
  - `lychee-lib` is the library component, in `lychee-lib/*`.
  - Users can now install only the `lychee-lib`, instead of both
    components, that would require fewer dependencies and faster
    compilation.
  - Dependencies for each component are adjusted and updated. E.g.,
    no CLI dependencies for `lychee-lib`.
  - CLI tests are only moved to `lychee`, as it has nothing to do
    with the library component.
- `Status::Error` is refactored to contain dedicated error enum,
  `ErrorKind`.
  - The motivation is to delay the formatting of errors to strings.
    Note that `e.to_string()` is not necessarily cheap (though
    trivial in many cases). The formatting is no delayed until the
    error is needed to be displayed to users. So in some cases, if
    the error is never used, it means that it won't be formatted at
    all.
- Replaced `regex` based matching with one of the following:
  - Simple string equality test in the case of 'false positivie'.
  - URL parsing based test, in the case of extracting repository and
    user name for GitHub links.
  - Either cases would be much more efficient than `regex` based
    matching. First, there's no need to construct a state machine for
    regex. Second, URL is already verified and parsed on its creation,
    and extracting its components is fairly cheap. Also, this removes
    the dependency on `lazy-static` in `lychee-lib`.
- `types` module now has a sub-directory, and its components are now
  separated into their own modules (in that sub-directory).
- `lychee-lib::test_utils` module is only compiled for tests.
- `wiremock` is moved to `dev-dependency` as it's only needed for
  `test` modules.
- Dependencies are listed in alphabetical order.
- Imports are organized in the following fashion:
  - Imports from `std`
  - Imports from 3rd-party crates, and `lychee-lib`.
  - Imports from `crate::*` or `super::*`.
- No glob import.
- I followed suggestion from `cargo clippy`, with `clippy::all` and
  `clippy:pedantic`.

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2021-04-15 01:24:11 +02:00
Renamed from tests/cli.rs (Browse further)