If an error occurs during link checking,
it is important to know where the error occured.
Therefore the request and response objects now contain a the input
source as a field. This makes error tracking easier.
* Fix HTML parsing for non-closed elements like <link>
The XML parser we use requires all tags to be closed by default,
and if they aren't (like HTML5 <link> elements), it simply gives up
on further parsing. This change makes it ignore such issues.
Also uncover a bug with the current parser (it simply won't parse
elements like `<script defer src="..."></script>`) -- e.g. elements
with no attribute values.
The XML parser is an XML parser and will have to be replaced with
HTML aware parser in the future.
* Add check for empty elements
* Update extract.rs
Co-authored-by: Matthias <matthias-endler@gmx.net>
For now we only support JSON.
I honestly don't know if it makes sense to include other formats.
For example, MD and HTML are not really
machine-readable. YAML is not
a great standard format for this use-case. Open for discussions, though.
This splits up the code into a `lib` and a `bin`
to make the runtime usable from other crates.
Co-authored-by: Paweł Romanowski <pawroman@pawroman.dev>
This implements a basic builder for the Checker struct as discussed in #12.
It is using derive_builder and uses a custom build method to instantiate the more elaborate fields like reqwest::Client.
It also adds deadpool and tokio::mpsc as dependencies to handle a pool of clients to query websites.
* Make GITHUB_TOKEN optional
This also makes the token possible to pass in from CLI args.
* Add missing test fixture file
* Normalize exit codes and GitHub checking behavior
The exit code is now defined as 1 for unexpected or config errors,
and 2 for link check failures.
GitHub checking behavior has been tweaked to generate errors if
a GitHub-specific check cannot be performed because of a missing
token.
* Remove short flag for github token
[Issue #18](https://github.com/hello-rust/lychee/issues/18)
* Add headers crate to type headers and create auth header
* Add cmd param basic-auth to set property to the main
* Add simple test to test if with auth headres is no broken
Signed-off-by: FabianBG <f4b4g3@gmail.com>
In one or more `include` arguments are specified, only check the URLs that match the patterns.
In case `exclude` arguments are also
specified, make an exception from the
excluded URLs if they also match the
`include` patterns.
The exclusion is currently based on IP addresses, as specified by inputs.
We support IPv4 and IPv6, where possible using the current stable stdlib
(as of Rust 1.47.0).
Note that we could go one step further and resolve all URIs using DNS
and then exclude-filter the private IPs.