Commit graph

16 commits

Author SHA1 Message Date
Matthias Endler
2b044a6f5b Fix exclude mail, add tests 2021-03-29 23:28:17 +02:00
Matthias Endler
5baaba3948 Add integration test 2021-02-28 19:09:11 +01:00
Matthias Endler
e00cdbf1ae example.com -> example.org 2021-02-21 16:33:33 +01:00
Matthias
702909c4ab
Mailto support (#138)
* Add mailto suport and use try_from for parsing URLs
* Cleanup and document code
2021-02-12 10:25:33 +01:00
Paweł Romanowski
aeab85da16
Use html5ever for HTML link extraction (#98) 2021-01-08 16:41:13 +01:00
Paweł Romanowski
cd00fa643e
Fix HTML parsing for non-closed elements like <link> (#92)
* Fix HTML parsing for non-closed elements like <link>

The XML parser we use requires all tags to be closed by default,
and if they aren't (like HTML5 <link> elements), it simply gives up
on further parsing.  This change makes it ignore such issues.

Also uncover a bug with the current parser (it simply won't parse
elements like `<script defer src="..."></script>`) -- e.g. elements
with no attribute values.

The XML parser is an XML parser and will have to be replaced with
HTML aware parser in the future.

* Add check for empty elements

* Update extract.rs

Co-authored-by: Matthias <matthias-endler@gmx.net>
2021-01-03 17:32:13 +01:00
Matthias
a78e8318cd
Add (machine-readable) output file support (fixes #53)
For now we only support JSON.
I honestly don't know if it makes sense to include other formats.
For example, MD and HTML are not really
machine-readable. YAML is not
a great standard format for this use-case. Open for discussions, though.
2020-12-14 01:15:14 +01:00
Paweł Romanowski
1f787613d4
Add support for reading from stdin and make input handling more robust (closes #26)
* Adds a `skip_missing` flag
* Adds an `Input` enum to handle different types of inputs
2020-12-02 23:28:37 +01:00
Paweł Romanowski
326683f4eb
Make GITHUB_TOKEN optional (#22)
* Make GITHUB_TOKEN optional

This also makes the token possible to pass in from CLI args.

* Add missing test fixture file

* Normalize exit codes and GitHub checking behavior

The exit code is now defined as 1 for unexpected or config errors,
and 2 for link check failures.

GitHub checking behavior has been tweaked to generate errors if
a GitHub-specific check cannot be performed because of a missing
token.

* Remove short flag for github token
2020-10-26 23:31:31 +01:00
WhizSid
6bd7bbf51f
feat: Support relative URLs (#15) 2020-10-21 01:31:06 +02:00
Paweł Romanowski
e175558376 Add --exclude-all-private flag and cli integration test 2020-10-17 10:01:06 +02:00
Matthias Endler
14d098f7cf Add mail 2020-08-23 23:19:21 +02:00
Matthias Endler
608499fdb4 Add more test links 2020-08-14 11:38:29 +02:00
Matthias Endler
391144b2ff Add globbing support 2020-08-14 02:33:04 +02:00
Matthias Endler
4aa2883371 Add more links 2020-08-09 22:43:11 +02:00
Matthias Endler
a58b3e1232 Add logging and proper URL parsing 2020-08-07 19:00:21 +02:00