lychee

mirror of https://github.com/Hopiu/lychee.git synced 2026-04-10 16:30:58 +00:00

Author	SHA1	Message	Date
Matthias	35ccfb87c3	Add support for dumping links to file (#810 )	2022-11-08 00:33:16 +01:00
Matthias	d61105edbb	Fix parsing error of email addresses with query params (#809 ) Email addresses with query parameters often get used in contact forms on websites. They can also be found in other documents like Markdown. A common use-case is to add a subject line to the email as a parameter e.g. `mailto:mail@example.com?subject="Hello"`. Previously we handled such cases incorrectly by recognizing them as files. The reason was that our email parsing was too strict to allow for that use-case. With `email_address` we switched to a more permissive parser. Note that this does not affect the actual address email checking, as this is still done `check-if-email-exists`, which has more strict check functionality.	2022-11-05 23:40:33 +01:00
Matthias	a42ad4c673	Twitter quirk fixed; adjust test (#741 )	2022-08-17 16:52:20 +02:00
Walter Beller-Morales	6d40a2ab7b	Update to gracefully handle nonexistent relative paths (#691 ) * Update Input::new to gracefully handle nonexistent relative paths * Add test checking Input::new can handle real relative paths * Add better pre-conditions to Input::new tests * Add integration tests for handling relative paths in lychee-bin * Update lychee-lib/src/types/input.rs	2022-07-22 17:15:55 +02:00
Matthias	6fae93f2da	Skip caching unsupported and excluded URLs (#692 ) As discussed in https://github.com/lycheeverse/lychee/issues/647#issuecomment-1170773449, it does not make much sense to cache unsupported and excluded URLs. Unsupported URLs might be supported in the future and caching them would mean they won't get checked then. Excluded URLs were excluded for a reason and should not appear in the cache. Furthermore they might not be excluded in a consecutive run, leading to a false-positive.	2022-07-17 18:40:45 +02:00
Walter Beller-Morales	9ad53f97a2	Fix deserialize of lycheecache status codes (#685 ) * Add custom deserializer for `CacheStatus` to properly classify status codes * Add CLI integration tests to check .lycheecache behavior * Add comment to explain conflict between cache and accept flags	2022-07-15 22:45:24 +02:00
vpereira01	d48a3279a8	Improve configuration example (#631 ) * Add missing parameters * Remove deprecated `--exclude-file` parameter * Improve TOML comments * Add config smoketest	2022-05-31 19:05:27 +02:00
Matthias	22fecfc056	Add support for URI remapping (#620 ) Remaps allow mapping from a URI pattern to a different URI. The syntax is ``` lychee --remap 'https://example.com http://127.0.0.1' ``` Some use-cases are - Testing URIs prior to production deployment - Testing URIs behind a proxy Be careful when using this feature because checking every link against a large set of regular expressions has a performance impact. Also there are no constraints on the URI mapping, so the rules might contradict with each other. Remap rules get applied in order of definition to every input URI.	2022-05-29 21:41:22 +02:00
Matthias	363b95fe5f	Add support for excluding paths from link checking (#623 ) This change deprecates `--exclude-file` as it was ambiguous. Instead, `--exclude-path` was introduced to support excluding paths to files and directories that should not be checked. Furthermore, `.lycheeignore` is now the only way to exclude URL patterns.	2022-05-29 17:27:09 +02:00
Matthias	b40c785b64	Also dump excluded links (#615 ) This is a minimally invasive version, which allows to grep for `[excluded]`. The reason for exclusion would require more work and it's debatable if it adds any value, because it might make grepping harder and the source of exclusion is easily deducatable from the commandline parameters or the `.lycheeignore` file. Fixes #587.	2022-05-13 18:53:16 +02:00
Matthias	b0136683a9	Add support for comments in `.lycheeignore` (#616 ) Lines starting with the comment character (`#`) inside the .lycheeignore file will be ignored. Whitespace at the beginning of each line will be ignored, so even an indented comment character will work.	2022-05-13 18:51:58 +02:00
Matthias	8c0a32d81d	Refactor response formatting (#599 ) * Add support for raw formatter (no color) * Introduce ResponseFormatter trait * Pass the same params to every cli command * Update dependencies * Remove pretty_assertions dependency (latest version doesn't build)	2022-04-25 19:19:36 +02:00
Matthias	743d386252	Allow input URLs without scheme (fixes #567 ) This requires `Input::new` to return a `Result`, because the URL parsing could fail when prepending `http://`. We use http instead of https, because curl does as well: `70ac27604a/lib/urlapi.c (L1104-L1124)` Missing files will be interpreted as URLs from the command line and these can be invalid, but that's not seen as an error anymore.	2022-03-27 01:27:27 +01:00
Matthias	d616177a99	Implement excluding code blocks (#523 ) This is done in the extractor to avoid unnecessary allocations.	2022-03-26 10:42:56 +01:00
Matthias	8097bfa408	Print Github token error once at the end (#537 ) Print original reqwest error for every Github link. It contains more information about the underlying error. Only print a message about the Github token at the end if it's not set and there were Github errors.	2022-03-03 10:04:55 +01:00
Matthias	0fc5fc9ffe	Print errors with a different format for easier clickability (fixes #532 )	2022-03-01 16:58:04 +01:00
Matthias	ba276cd51b	Error cleanup (#510 ) * Add more fine-grained error types; remove generic IO error * Update error message for missing file * Remove missing `Error` suffix * Rename ErrorKind::Github to ErrorKind::GithubRequest for consistency with NetworkRequest	2022-02-19 01:44:00 +01:00
Matthias	812663d832	Prevent flaky tests (#514 ) Move from example.org to example.com, which seems to be more permissive for testing	2022-02-18 10:29:49 +01:00
Matthias	47df7780fe	Use captured identifiers in format strings (#507 ) Makes for arguably cleaner-looking code. The downside is that the MSRV is 1.58 https://blog.rust-lang.org/2022/01/13/Rust-1.58.0.html Given that nobody uses lychee as a library yet and we have precompiled binaries, it's an acceptable tradeoff. My little research revealed that this is a much-liked feature: https://twitter.com/matthiasendler/status/1483895557621960715	2022-02-12 10:51:52 +01:00
Matthias	9d738fb3f5	Fix default config (#491 ) The default configuration was broken since the introduction of caching and specifically `max_cache_age`. This fixes deserialization and config merging for the case where this key is missing from the config.	2022-02-07 23:17:50 +01:00
Matthias	ac490f9c53	Add caching functionality (v2) (#443 ) A while ago, caching was removed due to some issues (see #349). This is a new implementation with the following improvements: * Architecture: The new implementation is decoupled from the collector, which was a major issue in the last version. Now the collector has a single responsibility: collecting links. This also avoids race-conditions when running multiple collect_links instances, which probably was an issue before. * Performance: Uses DashMap under the hood, which was noticeably faster than Mutex<HashMap> in my tests. * Simplicity: The cache format is a CSV file with two columns: URI and status. I decided to create a new struct called CacheStatus for serialization, because trying to serialize the error kinds in Status turned out to be a bit of a nightmare and at this point I don't think it's worth the pain (and probably isn't idiomatic either). This is an optional feature. Caching only gets used if the `--cache` flag is set.	2022-01-14 15:25:51 +01:00
Matthias	21f3160b71	Make retries configurable; align constants (#446 ) Using the same default values for the library and the binary now but tweaked the values a bit for slightly faster performance.	2022-01-07 01:03:10 +01:00
Matthias	8df50cf501	Stop testing Twitter quirk (see #448 )	2022-01-07 00:15:55 +01:00
Matthias	591cbdbebb	Add support for .lycheeignore file #308 (#402 ) This is similar to files like .gitignore and .dockerignore and gets merged into exclude_files	2021-11-23 01:39:53 +01:00
Matthias	d96c1269ff	Use thiserror for error handling (#399 ) This removes some boilerplate and is arguably better than handwriting the error handling code for maintainability and avoid inconsitent functionality for the error variants. thiserror is also the de-facto standard for library error types as of today.	2021-11-20 01:42:50 +01:00
Matthias	b97fda34d0	Add support for different output formats (compact, detailed, markdown) (#375 )	2021-11-18 00:44:48 +01:00
MichaIng	961f12e58e	Remove cache from collector and remove custom reqwest client pool * Reqwest comes with its own request pool, so there's no need in adding another layer of indirection. This also gets rid of a lot of allocs. * Remove cache from collector * Improve error handling and documentation * Add back test for request caching in single file Signed-off-by: MichaIng <micha@dietpi.com> Co-authored-by: Matthias <matthias-endler@gmx.net>	2021-10-07 18:07:18 +02:00
Matthias	3b41c4c375	Silently ignore absolute paths without base (fixes #320 ) (#338 )	2021-09-20 11:13:30 +02:00
Matthias	a1acf7b0d0	Reintegrate master	2021-09-09 01:49:25 +02:00
Matthias	93948d7367	Avoid double-encoding already encoded destination paths E.g. `web%20site` becomes `web site`. That's because Url::from_file_path will encode the full URL in the end. This behavior cannot be configured. See https://github.com/lycheeverse/lychee/pull/262#issuecomment-915245411	2021-09-09 01:44:10 +02:00
Matthias	a75cae54b1	Add failing test	2021-09-09 01:17:56 +02:00
Matthias	24ea2482d3	Update docs	2021-09-08 01:08:59 +02:00
Matthias	a28f932fb2	Fix wildcard test	2021-09-07 00:41:07 +02:00
Matthias	82652a69d5	Add test	2021-09-06 15:20:18 +02:00
Matthias	daa5be4c3a	Add/change file link tests	2021-09-06 15:19:09 +02:00
Lucius Hu	80b8a856ac	Add new flag `--require-https` (#195 )	2021-09-04 03:21:54 +02:00
Matthias	d959b54b56	run cargo fmt	2021-09-03 16:29:57 +02:00
dblock	dcee4a1058	Added support for --exclude-file.	2021-09-03 16:29:57 +02:00
dblock	739a3d6e41	Fix: remove URL that is currently returning a 503.	2021-09-03 16:29:57 +02:00
Matthias	fe399c0a8c	Simple URI cache (#243 )	2021-05-04 13:28:39 +02:00
Matthias	164e1aea7e	Add support for multiple schemes (#237 )	2021-04-26 18:24:54 +02:00
Matthias	f8426bafbf	Skip unsupported schemes (#236 )	2021-04-26 17:16:58 +02:00
Lucius Hu	228e5df6a3	Major refactor of codebase (#208 ) - The binary component and library component are separated as two packages in the same workspace. - `lychee` is the binary component, in `lychee-bin/`. - `lychee-lib` is the library component, in `lychee-lib/`. - Users can now install only the `lychee-lib`, instead of both components, that would require fewer dependencies and faster compilation. - Dependencies for each component are adjusted and updated. E.g., no CLI dependencies for `lychee-lib`. - CLI tests are only moved to `lychee`, as it has nothing to do with the library component. - `Status::Error` is refactored to contain dedicated error enum, `ErrorKind`. - The motivation is to delay the formatting of errors to strings. Note that `e.to_string()` is not necessarily cheap (though trivial in many cases). The formatting is no delayed until the error is needed to be displayed to users. So in some cases, if the error is never used, it means that it won't be formatted at all. - Replaced `regex` based matching with one of the following: - Simple string equality test in the case of 'false positivie'. - URL parsing based test, in the case of extracting repository and user name for GitHub links. - Either cases would be much more efficient than `regex` based matching. First, there's no need to construct a state machine for regex. Second, URL is already verified and parsed on its creation, and extracting its components is fairly cheap. Also, this removes the dependency on `lazy-static` in `lychee-lib`. - `types` module now has a sub-directory, and its components are now separated into their own modules (in that sub-directory). - `lychee-lib::test_utils` module is only compiled for tests. - `wiremock` is moved to `dev-dependency` as it's only needed for `test` modules. - Dependencies are listed in alphabetical order. - Imports are organized in the following fashion: - Imports from `std` - Imports from 3rd-party crates, and `lychee-lib`. - Imports from `crate::` or `super::`. - No glob import. - I followed suggestion from `cargo clippy`, with `clippy::all` and `clippy:pedantic`. Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>	2021-04-15 01:24:11 +02:00

43 commits