Commit graph

25 commits

Author SHA1 Message Date
Lucius Hu
228e5df6a3
Major refactor of codebase (#208)
- The binary component and library component are separated as two
  packages in the same workspace.
  - `lychee` is the binary component, in `lychee-bin/*`.
  - `lychee-lib` is the library component, in `lychee-lib/*`.
  - Users can now install only the `lychee-lib`, instead of both
    components, that would require fewer dependencies and faster
    compilation.
  - Dependencies for each component are adjusted and updated. E.g.,
    no CLI dependencies for `lychee-lib`.
  - CLI tests are only moved to `lychee`, as it has nothing to do
    with the library component.
- `Status::Error` is refactored to contain dedicated error enum,
  `ErrorKind`.
  - The motivation is to delay the formatting of errors to strings.
    Note that `e.to_string()` is not necessarily cheap (though
    trivial in many cases). The formatting is no delayed until the
    error is needed to be displayed to users. So in some cases, if
    the error is never used, it means that it won't be formatted at
    all.
- Replaced `regex` based matching with one of the following:
  - Simple string equality test in the case of 'false positivie'.
  - URL parsing based test, in the case of extracting repository and
    user name for GitHub links.
  - Either cases would be much more efficient than `regex` based
    matching. First, there's no need to construct a state machine for
    regex. Second, URL is already verified and parsed on its creation,
    and extracting its components is fairly cheap. Also, this removes
    the dependency on `lazy-static` in `lychee-lib`.
- `types` module now has a sub-directory, and its components are now
  separated into their own modules (in that sub-directory).
- `lychee-lib::test_utils` module is only compiled for tests.
- `wiremock` is moved to `dev-dependency` as it's only needed for
  `test` modules.
- Dependencies are listed in alphabetical order.
- Imports are organized in the following fashion:
  - Imports from `std`
  - Imports from 3rd-party crates, and `lychee-lib`.
  - Imports from `crate::*` or `super::*`.
- No glob import.
- I followed suggestion from `cargo clippy`, with `clippy::all` and
  `clippy:pedantic`.

Co-authored-by: Lucius Hu <lebensterben@users.noreply.github.com>
2021-04-15 01:24:11 +02:00
Matthias
2d2009ffe0
Assume HTML in case there is no extension (e.g. for URLs) (#217)
This is not entirely correct, but covers more use-cases
than previously. Eventually we have to revisit this
and implement a proper solution
2021-04-12 16:46:37 +02:00
Matthias
f66aaecf0f
Assume HTML in case there is no extension (e.g. for URLs) (#197) 2021-04-12 14:40:39 +02:00
Paweł Romanowski
a45e781d47
Fix URLs with '@' parsing as emails (#177)
* Fix URLs with '@' parsing as emails

Only consider a link an email if it fails to parse as URL.

Also use a proper email validation instead of a simple '@' check.

This uses the fast_chemail crate which parses email links according
to the HTML specification (which is much more practical than checking
for RFC 5322 formatted emails).  It's also worth noting that
fast_chemail is used internally (albeit indirectly) by the
check_if_email_exists crate.  This means that email addresses
not considered valid by fast_chemail wouldn't pass link checks
anyway.

* Fix comment in test
2021-03-14 20:10:36 +01:00
Joesan
cefe38ee25
Add support for reletive links in Markdown files (#150) 2021-02-22 01:11:15 +01:00
Matthias Endler
e00cdbf1ae example.com -> example.org 2021-02-21 16:33:33 +01:00
Matthias Endler
8d165a3cda Add support and tests for .markdown files 2021-02-21 09:37:49 +01:00
Matthias Endler
16cd67331a Add simple, standalone client
Adds a new function `lychee::check()`, which removes
a lot of boilerplate for simple cases. Adjusted the code,
tests, and documentation.
The downside is that `check` now returns a Result, so
we have to use `?` to get to the response. That's because
we have to account for the case where the given string is
not a valid URI.
2021-02-18 01:32:48 +01:00
Matthias Endler
54e1d3e078 Simplify tests 2021-02-16 00:35:59 +01:00
Matthias Endler
4bec47904e Show input source in status output
If an error occurs during link checking,
it is important to know where the error occured.
Therefore the request and response objects now contain a the input
source as a field. This makes error tracking easier.
2021-02-16 00:15:14 +01:00
Matthias
702909c4ab
Mailto support (#138)
* Add mailto suport and use try_from for parsing URLs
* Cleanup and document code
2021-02-12 10:25:33 +01:00
Paweł Romanowski
aeab85da16
Use html5ever for HTML link extraction (#98) 2021-01-08 16:41:13 +01:00
Paweł Romanowski
cd00fa643e
Fix HTML parsing for non-closed elements like <link> (#92)
* Fix HTML parsing for non-closed elements like <link>

The XML parser we use requires all tags to be closed by default,
and if they aren't (like HTML5 <link> elements), it simply gives up
on further parsing.  This change makes it ignore such issues.

Also uncover a bug with the current parser (it simply won't parse
elements like `<script defer src="..."></script>`) -- e.g. elements
with no attribute values.

The XML parser is an XML parser and will have to be replaced with
HTML aware parser in the future.

* Add check for empty elements

* Update extract.rs

Co-authored-by: Matthias <matthias-endler@gmx.net>
2021-01-03 17:32:13 +01:00
Paweł Romanowski
fa9c5ea2cf
Run clippy for all targets, including tests (#93)
The test code should also be linted.
2021-01-03 16:41:19 +01:00
Matthias
b7ab4abb0d
Make lychee usable as a library #13 (#46)
This splits up the code into a `lib` and a `bin`
to make the runtime usable from other crates.

Co-authored-by: Paweł Romanowski <pawroman@pawroman.dev>
2020-12-04 10:44:31 +01:00
Paweł Romanowski
1f787613d4
Add support for reading from stdin and make input handling more robust (closes #26)
* Adds a `skip_missing` flag
* Adds an `Input` enum to handle different types of inputs
2020-12-02 23:28:37 +01:00
Matthias
b0f7a805ef
Use builder pattern and channels (fixes #12) (#33)
This implements a basic builder for the Checker struct as discussed in #12.
It is using derive_builder and uses a custom build method to instantiate the more elaborate fields like reqwest::Client.
It also adds deadpool and tokio::mpsc as dependencies to handle a pool of clients to query websites.
2020-11-24 21:30:06 +01:00
WhizSid
6bd7bbf51f
feat: Support relative URLs (#15) 2020-10-21 01:31:06 +02:00
Paweł Romanowski
cd6cf4add4 Add Uri::host_ip method and tests
This lets extract the host IP address, if defined for a website.
Mail addresses are not supported.
2020-10-16 13:55:43 +02:00
Matthias Endler
d1683bba32 Add e-mail checking support 2020-08-23 23:22:48 +02:00
Matthias Endler
17139cfd3b Add escaping test 2020-08-17 20:19:36 +02:00
Matthias Endler
8588d2eade Update tests 2020-08-09 23:16:23 +02:00
Matthias Endler
6e0f559b25 Switch to linkify to cover non MD links 2020-08-09 23:12:25 +02:00
Matthias Endler
fb517dab03 Add tests for extract 2020-08-09 23:09:27 +02:00
Matthias Endler
bc615c9bfb Split up code into modules 2020-08-09 22:47:39 +02:00