lychee/lychee-lib/src
Matthias Endler cead4ce826
Improve srcset parsing (#1160)
Our current `srcset` parsing is pretty basic.

We split on comma and then on whitespace and take the first part, which is the image source URL.
However, we don't handle URLs containing unencoded commas like
</cdn-cgi/image/format=webp,width=640/https://img.youtube.com/vi/hVBl8_pgQf0/maxresdefault.jpg>, which leads to false-positives.

According to the spec, commas in strings should be encoded, but in practice, there are some websites which don't do that. To handle these cases, too, I propose to extend the `srcset` parsing to make use of a small "state machine", which detects if a comma is within the image source or outside of it while parsing.

This is part of an effort to reduce false-positives during link checking.

---------
Co-authored-by: Hugo McNally <45573837+HU90m@users.noreply.github.com>
2023-07-29 17:06:44 +02:00
..
basic_auth feat: add support for basic auth per URI (#1110) 2023-06-26 12:06:24 +02:00
extract Improve srcset parsing (#1160) 2023-07-29 17:06:44 +02:00
filter Make checking email addresses optional (#1171) 2023-07-19 19:58:38 +02:00
quirks Don't check Twitter URLs (#1147) 2023-07-13 17:31:59 +02:00
types feat: Add support for --dump-inputs (#1159) 2023-07-16 18:08:14 +02:00
utils feat: add support for basic auth per URI (#1110) 2023-06-26 12:06:24 +02:00
client.rs Make checking email addresses optional (#1171) 2023-07-19 19:58:38 +02:00
collector.rs feat: Add support for --dump-inputs (#1159) 2023-07-16 18:08:14 +02:00
lib.rs Cookie Support (#1146) 2023-07-13 17:32:41 +02:00
remap.rs Extend remap feature (#1133) 2023-07-05 15:05:19 +02:00
retry.rs Bump octocrab from 0.19.0 to 0.20.0 (#1045) 2023-04-17 23:14:24 +02:00
test_utils.rs Harden URL detection and extend verbatim elements (#899) 2023-01-04 00:38:19 +01:00