* Fix HTML parsing for non-closed elements like <link>
The XML parser we use requires all tags to be closed by default,
and if they aren't (like HTML5 <link> elements), it simply gives up
on further parsing. This change makes it ignore such issues.
Also uncover a bug with the current parser (it simply won't parse
elements like `<script defer src="..."></script>`) -- e.g. elements
with no attribute values.
The XML parser is an XML parser and will have to be replaced with
HTML aware parser in the future.
* Add check for empty elements
* Update extract.rs
Co-authored-by: Matthias <matthias-endler@gmx.net>
For now we only support JSON.
I honestly don't know if it makes sense to include other formats.
For example, MD and HTML are not really
machine-readable. YAML is not
a great standard format for this use-case. Open for discussions, though.
* Make GITHUB_TOKEN optional
This also makes the token possible to pass in from CLI args.
* Add missing test fixture file
* Normalize exit codes and GitHub checking behavior
The exit code is now defined as 1 for unexpected or config errors,
and 2 for link check failures.
GitHub checking behavior has been tweaked to generate errors if
a GitHub-specific check cannot be performed because of a missing
token.
* Remove short flag for github token