lychee

mirror of https://github.com/Hopiu/lychee.git synced 2026-04-01 12:20:22 +00:00

Author	SHA1	Message	Date
Matthias	166c86c30e	Use tokenizer for extraction; add benchmark (#424 ) This avoids creating a DOM tree for link extraction and instead uses a `TokenSink` for on-the-fly extraction. In hyperfine benchmarks it was about 10-25% faster than the master. Old: 4.557 s ± 0.404 s New: 3.832 s ± 0.131 s The performance fluctuates a little less as well. Some missing element/attribute pairs were also added, which contain links according to the HTML spec. These occur very rarely, but it's good to parse them for completeness' sake. Furthermore tried to clean up a lot of papercuts around our types. We now differentiate between a `RawUri` (stringy-types) and a Uri, which is a properly parsed `URI` type. The extractor now only deals with extracting `RawUri`s while the collector creates the request objects.	2021-12-16 18:45:52 +01:00
Matthias	591cbdbebb	Add support for .lycheeignore file #308 (#402 ) This is similar to files like .gitignore and .dockerignore and gets merged into exclude_files	2021-11-23 01:39:53 +01:00
MichaIng	961f12e58e	Remove cache from collector and remove custom reqwest client pool * Reqwest comes with its own request pool, so there's no need in adding another layer of indirection. This also gets rid of a lot of allocs. * Remove cache from collector * Improve error handling and documentation * Add back test for request caching in single file Signed-off-by: MichaIng <micha@dietpi.com> Co-authored-by: Matthias <matthias-endler@gmx.net>	2021-10-07 18:07:18 +02:00
Matthias	a75cae54b1	Add failing test	2021-09-09 01:17:56 +02:00
Matthias	5d0b95271d	Remove anchor from file links	2021-09-07 00:20:09 +02:00
Matthias	03f5df91cd	Add fixtures for offline testing	2021-09-06 15:20:18 +02:00
Matthias	495f856c61	cleanup	2021-09-06 15:19:24 +02:00
Matthias	ee70e13bf7	Check real link to file	2021-09-06 15:19:09 +02:00
Matthias	f5ee472d93	explicit naming	2021-09-06 15:19:09 +02:00
Matthias Endler	701fbc9ada	Add support for local files	2021-09-06 15:14:33 +02:00
Lucius Hu	80b8a856ac	Add new flag `--require-https` (#195 )	2021-09-04 03:21:54 +02:00
dblock	dcee4a1058	Added support for --exclude-file.	2021-09-03 16:29:57 +02:00
dblock	739a3d6e41	Fix: remove URL that is currently returning a 503.	2021-09-03 16:29:57 +02:00
Matthias	fe399c0a8c	Simple URI cache (#243 )	2021-05-04 13:28:39 +02:00
Matthias	164e1aea7e	Add support for multiple schemes (#237 )	2021-04-26 18:24:54 +02:00
Matthias	f8426bafbf	Skip unsupported schemes (#236 )	2021-04-26 17:16:58 +02:00
Matthias Endler	2b044a6f5b	Fix exclude mail, add tests	2021-03-29 23:28:17 +02:00
Matthias Endler	5baaba3948	Add integration test	2021-02-28 19:09:11 +01:00
Matthias Endler	e00cdbf1ae	example.com -> example.org	2021-02-21 16:33:33 +01:00
Matthias	702909c4ab	Mailto support (#138 ) * Add mailto suport and use try_from for parsing URLs * Cleanup and document code	2021-02-12 10:25:33 +01:00
Paweł Romanowski	aeab85da16	Use html5ever for HTML link extraction (#98 )	2021-01-08 16:41:13 +01:00
Paweł Romanowski	cd00fa643e	Fix HTML parsing for non-closed elements like <link> (#92 ) * Fix HTML parsing for non-closed elements like <link> The XML parser we use requires all tags to be closed by default, and if they aren't (like HTML5 <link> elements), it simply gives up on further parsing. This change makes it ignore such issues. Also uncover a bug with the current parser (it simply won't parse elements like `<script defer src="..."></script>`) -- e.g. elements with no attribute values. The XML parser is an XML parser and will have to be replaced with HTML aware parser in the future. * Add check for empty elements * Update extract.rs Co-authored-by: Matthias <matthias-endler@gmx.net>	2021-01-03 17:32:13 +01:00
Matthias	a78e8318cd	Add (machine-readable) output file support (fixes #53 ) For now we only support JSON. I honestly don't know if it makes sense to include other formats. For example, MD and HTML are not really machine-readable. YAML is not a great standard format for this use-case. Open for discussions, though.	2020-12-14 01:15:14 +01:00
Paweł Romanowski	1f787613d4	Add support for reading from stdin and make input handling more robust (closes #26 ) * Adds a `skip_missing` flag * Adds an `Input` enum to handle different types of inputs	2020-12-02 23:28:37 +01:00
Paweł Romanowski	326683f4eb	Make GITHUB_TOKEN optional (#22 ) * Make GITHUB_TOKEN optional This also makes the token possible to pass in from CLI args. * Add missing test fixture file * Normalize exit codes and GitHub checking behavior The exit code is now defined as 1 for unexpected or config errors, and 2 for link check failures. GitHub checking behavior has been tweaked to generate errors if a GitHub-specific check cannot be performed because of a missing token. * Remove short flag for github token	2020-10-26 23:31:31 +01:00
WhizSid	6bd7bbf51f	feat: Support relative URLs (#15 )	2020-10-21 01:31:06 +02:00
Paweł Romanowski	e175558376	Add --exclude-all-private flag and cli integration test	2020-10-17 10:01:06 +02:00
Matthias Endler	14d098f7cf	Add mail	2020-08-23 23:19:21 +02:00
Matthias Endler	608499fdb4	Add more test links	2020-08-14 11:38:29 +02:00
Matthias Endler	391144b2ff	Add globbing support	2020-08-14 02:33:04 +02:00
Matthias Endler	4aa2883371	Add more links	2020-08-09 22:43:11 +02:00
Matthias Endler	a58b3e1232	Add logging and proper URL parsing	2020-08-07 19:00:21 +02:00

32 commits