lychee

mirror of https://github.com/Hopiu/lychee.git synced 2026-05-13 16:23:12 +00:00

Author	SHA1	Message	Date
Matthias Endler	97573123ef	Extend remap feature (#1133 ) * wip * Extend support for remapping This adds supports for partial remaps and capture groups to the remap feature. Fixes #1129	2023-07-05 15:05:19 +02:00
Techassi	67af7ef6d3	feat: add support for basic auth per URI (#1110 ) * Add support for basic auth per domain * Move URI matching to link collection phase * Allow AsRef for BasicAuthExtractor::new to avoid clone * Add tests --------- Co-authored-by: Matthias Endler <matthias@endler.dev>	2023-06-26 12:06:24 +02:00
Thomas Zahner	130fa21a6a	Concurrent archives (#1027 )	2023-05-11 20:20:27 +02:00
Matthias Endler	55797071b0	Fix nested URL extraction in verbatim elements (#988 ) Skipping URLs in verbatim elements didn't take nested elements into consideration, which were not verbatim. For instance, the following HTML snippet would yield `https://example.com` in non-verbatim mode, even if it is nested inside a verbatim `<pre>` element: ```html <pre><a href="https://example.com">link</a></pre> ``` This commit fixes the behavior for both `html5gum` and `html5ever`. Note that nested verbatim elements of the same kind still are not handled correctly. For instance, the following HTML snippet would still yield `https://example.com`: ```html <pre> <pre></pre> <a href="https://example.com">link</a> </pre> ``` The reason is that we currently only keep track of a single verbatim element and not a stack of elements, which we would need to unwind and resolve the situation. Fixes https://github.com/lycheeverse/lychee/issues/986.	2023-03-11 15:18:25 +01:00
Matthias	c9edb7f809	Split up quirks and skip twitter check It's flaky on Github	2023-03-03 12:13:09 +01:00
Matthias	08466ad59b	Ignore config smoketest output report file	2023-03-03 12:13:09 +01:00
Matthias	86f13609e6	Put lycheecache tests into separate subfolders to avoid race	2023-03-03 12:13:09 +01:00
Matthias	388bd20673	Fix tests after `address` is no longer a verbatim element	2023-03-03 12:13:09 +01:00
Matthias Endler	7874195bbb	Customize verbosity (#956 )	2023-02-24 23:53:09 +01:00
Matthias Endler	5654b7c317	Harden URL detection and extend verbatim elements (#899 ) Previously remote URLs were incorrectly detected because the string representation of a path is different than the path itself, causing the `http` prefix match to be insufficient. This resulted in unexpected side-effects, such as the incorrect detection of verbatim mode for remote URLs. The check now got improved and unit tests were added to avoid future breakage. On top of that, missing verbatim elements were added	2023-01-04 00:38:19 +01:00
Matthias	982d978e47	Add different verbosity levels (#824 ) More granular verbosity levels have been asked for repeatedly. To enable that we're moving to [env_logger] and [clap-verbosity-flag] to provide more flexible verbosity settings. Also tackles #661, #709 Lays the groundwork for tackling #268 https://github.com/rust-cli/env_logger https://github.com/clap-rs/clap-verbosity-flag	2022-11-28 23:25:33 +01:00
Matthias	765f7adb12	Don't check example mail addresses by default (#815 ) This was an oversight so far that became apparent after our recent fix for email addreses with query params (e.g. `test@example.com?subject=test`). The parsing of email addresses has improved and so we detect more mail addresses, but we didn't check if they belonged to an example domain, causing false-positive checks.	2022-11-08 23:46:32 +01:00
Matthias	d61105edbb	Fix parsing error of email addresses with query params (#809 ) Email addresses with query parameters often get used in contact forms on websites. They can also be found in other documents like Markdown. A common use-case is to add a subject line to the email as a parameter e.g. `mailto:mail@example.com?subject="Hello"`. Previously we handled such cases incorrectly by recognizing them as files. The reason was that our email parsing was too strict to allow for that use-case. With `email_address` we switched to a more permissive parser. Note that this does not affect the actual address email checking, as this is still done `check-if-email-exists`, which has more strict check functionality.	2022-11-05 23:40:33 +01:00
Andy Grunwald	a67b513238	Extend description of "--exclude" to also exclude email addresses, not only URLs (#801 )	2022-10-23 12:17:20 +02:00
Matthias	601adcefd3	Add new SVG-based screencast (#693 ) This is taken from https://github.com/sharkdp/fd, so all credits go to the original authors. The demo was a bit dated. We've since added more features and changed the output. On top of that, the gif was a bit blurry. The new version is in SVG and the commands can be scripted, so we can change them with a PR and render them through CI. Co-authored-by: Brennan Kinney <5098581+polarathene@users.noreply.github.com>	2022-08-10 17:35:50 +02:00
Walter Beller-Morales	9ad53f97a2	Fix deserialize of lycheecache status codes (#685 ) * Add custom deserializer for `CacheStatus` to properly classify status codes * Add CLI integration tests to check .lycheecache behavior * Add comment to explain conflict between cache and accept flags	2022-07-15 22:45:24 +02:00
Matthias	a557cba0b4	Add support for parsing list of status codes from config file (#636 )	2022-06-02 18:53:04 +02:00
Matthias	9b4dfadffd	Fix parsing errors with config options (#632 )	2022-05-31 19:43:46 +02:00
vpereira01	d48a3279a8	Improve configuration example (#631 ) * Add missing parameters * Remove deprecated `--exclude-file` parameter * Improve TOML comments * Add config smoketest	2022-05-31 19:05:27 +02:00
Matthias	f33b897d5d	Exclude example domains as per RFC 2606 from checking (#627 ) Unfortunately it's not possible to automatically enable features for `cargo test`. See https://github.com/rust-lang/cargo/issues/2911. As a workaround to allow for using example domains for unit- and integration tests, we introduce a new feature, `check_example_domains`, which is disabled by default for normal users. The feature gets activated for the integration test which checks that the example domain exclusion works as expected.	2022-05-29 21:42:00 +02:00
Matthias	363b95fe5f	Add support for excluding paths from link checking (#623 ) This change deprecates `--exclude-file` as it was ambiguous. Instead, `--exclude-path` was introduced to support excluding paths to files and directories that should not be checked. Furthermore, `.lycheeignore` is now the only way to exclude URL patterns.	2022-05-29 17:27:09 +02:00
Matthias	b40c785b64	Also dump excluded links (#615 ) This is a minimally invasive version, which allows to grep for `[excluded]`. The reason for exclusion would require more work and it's debatable if it adds any value, because it might make grepping harder and the source of exclusion is easily deducatable from the commandline parameters or the `.lycheeignore` file. Fixes #587.	2022-05-13 18:53:16 +02:00
Matthias	b0136683a9	Add support for comments in `.lycheeignore` (#616 ) Lines starting with the comment character (`#`) inside the .lycheeignore file will be ignored. Whitespace at the beginning of each line will be ignored, so even an indented comment character will work.	2022-05-13 18:51:58 +02:00
Matthias	03d28820bb	Extract more status information from reqwest (#577 ) Recently we cleaned up the commandline output to trim away redundant information like the URL, which occured twice. Unfortunately we also removed helpful information from reqwest, which could support the user in troubleshooting unexpected errors. This commit reverts that. We now extract the meaningful information from reqwest, without being too verbose. For that we have to depend on the string output for the reqwest error, but it's better than hiding that information from the user. It is fragile as it depends on the reqwest internals, but in the worst case we simply return the full error text in case our parsing won't work.	2022-04-02 14:37:03 +02:00
Matthias	d616177a99	Implement excluding code blocks (#523 ) This is done in the extractor to avoid unnecessary allocations.	2022-03-26 10:42:56 +01:00
Matthias	812663d832	Prevent flaky tests (#514 ) Move from example.org to example.com, which seems to be more permissive for testing	2022-02-18 10:29:49 +01:00
Matthias	9d738fb3f5	Fix default config (#491 ) The default configuration was broken since the introduction of caching and specifically `max_cache_age`. This fixes deserialization and config merging for the case where this key is missing from the config.	2022-02-07 23:17:50 +01:00
Matthias	6635863746	Add Alpine page for benchmark; refactor code (#481 )	2022-01-27 23:42:06 +01:00
Matthias	166c86c30e	Use tokenizer for extraction; add benchmark (#424 ) This avoids creating a DOM tree for link extraction and instead uses a `TokenSink` for on-the-fly extraction. In hyperfine benchmarks it was about 10-25% faster than the master. Old: 4.557 s ± 0.404 s New: 3.832 s ± 0.131 s The performance fluctuates a little less as well. Some missing element/attribute pairs were also added, which contain links according to the HTML spec. These occur very rarely, but it's good to parse them for completeness' sake. Furthermore tried to clean up a lot of papercuts around our types. We now differentiate between a `RawUri` (stringy-types) and a Uri, which is a properly parsed `URI` type. The extractor now only deals with extracting `RawUri`s while the collector creates the request objects.	2021-12-16 18:45:52 +01:00
Matthias	591cbdbebb	Add support for .lycheeignore file #308 (#402 ) This is similar to files like .gitignore and .dockerignore and gets merged into exclude_files	2021-11-23 01:39:53 +01:00
MichaIng	961f12e58e	Remove cache from collector and remove custom reqwest client pool * Reqwest comes with its own request pool, so there's no need in adding another layer of indirection. This also gets rid of a lot of allocs. * Remove cache from collector * Improve error handling and documentation * Add back test for request caching in single file Signed-off-by: MichaIng <micha@dietpi.com> Co-authored-by: Matthias <matthias-endler@gmx.net>	2021-10-07 18:07:18 +02:00
Matthias	a75cae54b1	Add failing test	2021-09-09 01:17:56 +02:00
Matthias	5d0b95271d	Remove anchor from file links	2021-09-07 00:20:09 +02:00
Matthias	03f5df91cd	Add fixtures for offline testing	2021-09-06 15:20:18 +02:00
Matthias	495f856c61	cleanup	2021-09-06 15:19:24 +02:00
Matthias	ee70e13bf7	Check real link to file	2021-09-06 15:19:09 +02:00
Matthias	f5ee472d93	explicit naming	2021-09-06 15:19:09 +02:00
Matthias Endler	701fbc9ada	Add support for local files	2021-09-06 15:14:33 +02:00
Lucius Hu	80b8a856ac	Add new flag `--require-https` (#195 )	2021-09-04 03:21:54 +02:00
dblock	dcee4a1058	Added support for --exclude-file.	2021-09-03 16:29:57 +02:00
dblock	739a3d6e41	Fix: remove URL that is currently returning a 503.	2021-09-03 16:29:57 +02:00
Matthias	fe399c0a8c	Simple URI cache (#243 )	2021-05-04 13:28:39 +02:00
Matthias	164e1aea7e	Add support for multiple schemes (#237 )	2021-04-26 18:24:54 +02:00
Matthias	f8426bafbf	Skip unsupported schemes (#236 )	2021-04-26 17:16:58 +02:00
Matthias Endler	2b044a6f5b	Fix exclude mail, add tests	2021-03-29 23:28:17 +02:00
Matthias Endler	5baaba3948	Add integration test	2021-02-28 19:09:11 +01:00
Matthias Endler	e00cdbf1ae	example.com -> example.org	2021-02-21 16:33:33 +01:00
Matthias	702909c4ab	Mailto support (#138 ) * Add mailto suport and use try_from for parsing URLs * Cleanup and document code	2021-02-12 10:25:33 +01:00
Paweł Romanowski	aeab85da16	Use html5ever for HTML link extraction (#98 )	2021-01-08 16:41:13 +01:00
Paweł Romanowski	cd00fa643e	Fix HTML parsing for non-closed elements like <link> (#92 ) * Fix HTML parsing for non-closed elements like <link> The XML parser we use requires all tags to be closed by default, and if they aren't (like HTML5 <link> elements), it simply gives up on further parsing. This change makes it ignore such issues. Also uncover a bug with the current parser (it simply won't parse elements like `<script defer src="..."></script>`) -- e.g. elements with no attribute values. The XML parser is an XML parser and will have to be replaced with HTML aware parser in the future. * Add check for empty elements * Update extract.rs Co-authored-by: Matthias <matthias-endler@gmx.net>	2021-01-03 17:32:13 +01:00

1 2

60 commits