Commit graph

137 commits

Author SHA1 Message Date
Matthias Endler
751deeb9b9 Show failure reason in status output
Use HashSet<Response> instead of Vec<Uri> for status output
2021-02-17 12:18:56 +01:00
Matthias Endler
8173e9927f Be more explicit about supported output formats 2021-02-17 12:17:45 +01:00
Matthias Endler
d16e4fa1bc Implement Serialize and Display for Status 2021-02-17 12:13:49 +01:00
Matthias Endler
b7bdfa6890 Add helper methods for Status 2021-02-17 12:13:32 +01:00
Matthias Endler
8e15bfb0be Implement Display and Serialize for Response 2021-02-17 12:12:37 +01:00
Matthias Endler
7859cb17c6 Cleanup unused deserialize 2021-02-17 12:11:51 +01:00
Matthias Endler
d9adfbf80f Add support for serializing input 2021-02-17 12:11:28 +01:00
Matthias Endler
d6b960368a Adjust tests 2021-02-17 12:10:57 +01:00
Matthias Endler
4faf40cfba Move check functions closer together 2021-02-17 01:01:43 +01:00
Matthias Endler
e859f1290e Correct matches for fail_map and add test 2021-02-16 01:21:45 +01:00
Matthias Endler
428df23c1c Fix lints and test on Linux 2021-02-16 00:53:01 +01:00
Matthias Endler
54e1d3e078 Simplify tests 2021-02-16 00:35:59 +01:00
Matthias Endler
4bec47904e Show input source in status output
If an error occurs during link checking,
it is important to know where the error occured.
Therefore the request and response objects now contain a the input
source as a field. This makes error tracking easier.
2021-02-16 00:15:14 +01:00
Matthias
702909c4ab
Mailto support (#138)
* Add mailto suport and use try_from for parsing URLs
* Cleanup and document code
2021-02-12 10:25:33 +01:00
Matthias
0b148bf5e6
Exclude e-mails from being checked (#137)
This can be useful in CI environments where SMTP is not allowed.
2021-02-10 11:58:04 +01:00
Paweł Romanowski
836f557829
Ensure destructors are ran before std::process::exit (#134)
See comments in code for more details.
2021-02-08 11:04:01 +01:00
Matthias
a50c04fffe
Add hint about separating inputs from options with -- (fixes #113) (#119) 2021-01-17 17:01:06 +01:00
Paweł Romanowski
aeab85da16
Use html5ever for HTML link extraction (#98) 2021-01-08 16:41:13 +01:00
dependabot-preview[bot]
a3ad492c0b
Update dependencies (reqwest 0.11 and tokio 1.0) (#51)
Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: Matthias Endler <matthias-endler@gmx.net>
2021-01-07 00:10:58 +01:00
Paweł Romanowski
cd00fa643e
Fix HTML parsing for non-closed elements like <link> (#92)
* Fix HTML parsing for non-closed elements like <link>

The XML parser we use requires all tags to be closed by default,
and if they aren't (like HTML5 <link> elements), it simply gives up
on further parsing.  This change makes it ignore such issues.

Also uncover a bug with the current parser (it simply won't parse
elements like `<script defer src="..."></script>`) -- e.g. elements
with no attribute values.

The XML parser is an XML parser and will have to be replaced with
HTML aware parser in the future.

* Add check for empty elements

* Update extract.rs

Co-authored-by: Matthias <matthias-endler@gmx.net>
2021-01-03 17:32:13 +01:00
Paweł Romanowski
fa9c5ea2cf
Run clippy for all targets, including tests (#93)
The test code should also be linted.
2021-01-03 16:41:19 +01:00
Matthias
a78e8318cd
Add (machine-readable) output file support (fixes #53)
For now we only support JSON.
I honestly don't know if it makes sense to include other formats.
For example, MD and HTML are not really
machine-readable. YAML is not
a great standard format for this use-case. Open for discussions, though.
2020-12-14 01:15:14 +01:00
Matthias
b7ab4abb0d
Make lychee usable as a library #13 (#46)
This splits up the code into a `lib` and a `bin`
to make the runtime usable from other crates.

Co-authored-by: Paweł Romanowski <pawroman@pawroman.dev>
2020-12-04 10:44:31 +01:00
Paweł Romanowski
1f787613d4
Add support for reading from stdin and make input handling more robust (closes #26)
* Adds a `skip_missing` flag
* Adds an `Input` enum to handle different types of inputs
2020-12-02 23:28:37 +01:00
Matthias Endler
68b574d85d Make README.md the default when no inputs are given
That used to be the case according to the docs,
but somehow we broke it while introducing some
changes.
2020-11-25 10:40:36 +01:00
Matthias
8294e47307
Properly announce lychee user agent (#38) 2020-11-25 10:11:02 +01:00
Matthias
b0f7a805ef
Use builder pattern and channels (fixes #12) (#33)
This implements a basic builder for the Checker struct as discussed in #12.
It is using derive_builder and uses a custom build method to instantiate the more elaborate fields like reqwest::Client.
It also adds deadpool and tokio::mpsc as dependencies to handle a pool of clients to query websites.
2020-11-24 21:30:06 +01:00
Matthias Endler
d0b7a64d0a Refactor and add documentation 2020-11-10 00:03:50 +01:00
Paweł Romanowski
326683f4eb
Make GITHUB_TOKEN optional (#22)
* Make GITHUB_TOKEN optional

This also makes the token possible to pass in from CLI args.

* Add missing test fixture file

* Normalize exit codes and GitHub checking behavior

The exit code is now defined as 1 for unexpected or config errors,
and 2 for link check failures.

GitHub checking behavior has been tweaked to generate errors if
a GitHub-specific check cannot be performed because of a missing
token.

* Remove short flag for github token
2020-10-26 23:31:31 +01:00
Milton Fabian Bastidas Guerra
2bf62e7709
Add support for basic auth #18 (#20)
[Issue #18](https://github.com/hello-rust/lychee/issues/18)
* Add headers crate to type headers and create auth header
* Add cmd param basic-auth to set property to the main
* Add simple test to test if with auth headres is no broken

Signed-off-by: FabianBG <f4b4g3@gmail.com>
2020-10-26 09:23:45 +01:00
Matthias
f0e4c3adc1
Add support for include patterns (#23)
In one or more `include` arguments are specified, only check the URLs that match the patterns.
In case `exclude` arguments are also
specified, make an exception from the
excluded URLs if they also match the
`include` patterns.
2020-10-25 13:41:06 +01:00
Alexander Krantz
3a12b3e220
Configuration file (lychee.toml) (#16) 2020-10-21 02:10:25 +02:00
WhizSid
6bd7bbf51f
feat: Support relative URLs (#15) 2020-10-21 01:31:06 +02:00
Paweł Romanowski
0790afdbf2 Un-ignore test_timeout, upgrade wiremock to 0.3 2020-10-18 11:44:00 +02:00
Matthias
f9fe11b078
Merge pull request #2 from u5surf/issue-1
Support exponential backoff in check_real #1
2020-10-18 00:46:50 +02:00
Matthias
9ac2176b32
Merge pull request #10 from pawroman/fix_clippy_warnings
Actions improvements: add rustfmt and clippy checks, run tests
2020-10-18 00:31:28 +02:00
Paweł Romanowski
e175558376 Add --exclude-all-private flag and cli integration test 2020-10-17 10:01:06 +02:00
Paweł Romanowski
b2ada4746c Introduce cargo fmt and clippy checks, fix all clippy warnings 2020-10-16 14:35:38 +02:00
Paweł Romanowski
c043776b77 Use website_url function in checker tests
This makes the test code shorter and more readable.
2020-10-16 13:59:44 +02:00
Paweł Romanowski
69e18785f9 Implement exclude private URLs feature
The exclusion is currently based on IP addresses, as specified by inputs.
We support IPv4 and IPv6, where possible using the current stable stdlib
(as of Rust 1.47.0).

Note that we could go one step further and resolve all URIs using DNS
and then exclude-filter the private IPs.
2020-10-16 13:59:42 +02:00
Paweł Romanowski
cd6cf4add4 Add Uri::host_ip method and tests
This lets extract the host IP address, if defined for a website.
Mail addresses are not supported.
2020-10-16 13:55:43 +02:00
Paweł Romanowski
ba278743ed Simplify println invocation 2020-10-16 13:55:43 +02:00
Matthias
92b9ed11ca
Remove workaround for type recursion 2020-10-15 01:58:35 +02:00
Xiaochuan Yu
e80cb70a98 default true flags don't work? 2020-10-10 11:24:56 -04:00
Xiaochuan Yu
df54ce1eef Add progress bar 2020-10-10 00:39:06 -04:00
u5surf
fb860c52d2 Support exponential backoff in check_real #1 2020-10-03 06:34:30 +09:00
Matthias Endler
d1683bba32 Add e-mail checking support 2020-08-23 23:22:48 +02:00
Matthias Endler
16649a1d22 Use timeout instead of connect_timeout 2020-08-22 00:41:24 +02:00
Matthias Endler
c953528fb7 Add connection timeout 2020-08-22 00:36:03 +02:00
Matthias Endler
3650c673df Move to hubcaps. Allow defining accepted status codes 2020-08-18 01:17:26 +02:00
Matthias Endler
17139cfd3b Add escaping test 2020-08-17 20:19:36 +02:00
Matthias Endler
e224dcb5c7 Support head requests 2020-08-14 17:36:43 +02:00
Matthias Endler
25a31eacd6 Add support for custom request headers 2020-08-14 15:24:41 +02:00
Matthias Endler
b6a96c0c0f Clean up error code handling 2020-08-14 11:48:55 +02:00
Matthias Endler
bdd83128eb Refactor options 2020-08-14 11:43:45 +02:00
Matthias Endler
fbf6f09482 Refactor collector 2020-08-14 11:38:41 +02:00
Matthias Endler
391144b2ff Add globbing support 2020-08-14 02:33:04 +02:00
Matthias Endler
e758056f60 Add support for scheme (e.g. HTTPS) 2020-08-14 01:54:05 +02:00
Matthias Endler
8356ef1d03 Add support for website input 2020-08-14 01:14:47 +02:00
Matthias Endler
bbfe6a8531 rename summary function 2020-08-13 23:16:00 +02:00
Matthias Endler
b9757c505c Refactor link collection 2020-08-13 23:15:24 +02:00
Matthias Endler
47f1e306ab Support multiple file inputs 2020-08-13 23:01:30 +02:00
Matthias Endler
cca984017a Add support for changing number of threads 2020-08-13 19:58:41 +02:00
Matthias Endler
bb5268fbf1 Add statistics 2020-08-13 19:58:25 +02:00
Matthias Endler
184e263a44 Remove Github token from env var 2020-08-13 19:57:48 +02:00
Matthias Endler
96ea6c7a5c Add support for ignoring certificate errors 2020-08-12 23:38:21 +02:00
Matthias Endler
cd79f72d2d Make user-agent configurable 2020-08-12 13:10:15 +02:00
Matthias Endler
156f2b03c2 Make redirects configurable 2020-08-12 12:59:15 +02:00
Matthias Endler
1566a99647 Use CheckStatus enum for more fine-grained control over check results 2020-08-12 12:36:05 +02:00
Matthias Endler
d4a3b09790 Add support for excluding URLs 2020-08-11 22:48:50 +02:00
Matthias Endler
1d235b578b pico-args -> gumdrop
Needed multi-value args (with values accumulated into a vec)
for excluded urls
2020-08-11 22:18:37 +02:00
Matthias Endler
ad8033a90f Info output 2020-08-11 22:06:18 +02:00
Matthias Endler
3fc369b3fa Make things blazing fast thanks to async 2020-08-11 16:13:10 +02:00
Matthias Endler
262b18e6b8 Rewrite link checking into iterator
This is a preparation for easier async execution
2020-08-09 23:36:14 +02:00
Matthias Endler
8588d2eade Update tests 2020-08-09 23:16:23 +02:00
Matthias Endler
6e0f559b25 Switch to linkify to cover non MD links 2020-08-09 23:12:25 +02:00
Matthias Endler
fb517dab03 Add tests for extract 2020-08-09 23:09:27 +02:00
Matthias Endler
883b68ce66 Clippy 2020-08-09 22:50:17 +02:00
Matthias Endler
46e30f081b Move tests 2020-08-09 22:49:14 +02:00
Matthias Endler
23ba1f2e11 Formatting 2020-08-09 22:48:02 +02:00
Matthias Endler
bc615c9bfb Split up code into modules 2020-08-09 22:47:39 +02:00
Matthias Endler
7c51a24c44 Print instead of info 2020-08-09 22:43:25 +02:00
Matthias Endler
5876f494f3 Only request unique URLs 2020-08-08 00:09:10 +02:00
Matthias Endler
dc7af2d74e Clean up error handling and configure reqwests 2020-08-08 00:06:17 +02:00
Matthias Endler
a58b3e1232 Add logging and proper URL parsing 2020-08-07 19:00:21 +02:00
Matthias Endler
8885a63f82 check normal link first and only if it fails use Github api 2020-08-05 01:44:16 +02:00
Matthias Endler
2a93ce8093 Extract from hello-rust/show repository 2020-08-05 00:32:37 +02:00