mirror of
https://github.com/Hopiu/lychee.git
synced 2026-04-08 23:40:58 +00:00
Our current `srcset` parsing is pretty basic. We split on comma and then on whitespace and take the first part, which is the image source URL. However, we don't handle URLs containing unencoded commas like </cdn-cgi/image/format=webp,width=640/https://img.youtube.com/vi/hVBl8_pgQf0/maxresdefault.jpg>, which leads to false-positives. According to the spec, commas in strings should be encoded, but in practice, there are some websites which don't do that. To handle these cases, too, I propose to extend the `srcset` parsing to make use of a small "state machine", which detects if a comma is within the image source or outside of it while parsing. This is part of an effort to reduce false-positives during link checking. --------- Co-authored-by: Hugo McNally <45573837+HU90m@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| basic_auth | ||
| extract | ||
| filter | ||
| quirks | ||
| types | ||
| utils | ||
| client.rs | ||
| collector.rs | ||
| lib.rs | ||
| remap.rs | ||
| retry.rs | ||
| test_utils.rs | ||