lychee/lychee-lib
Matthias Endler cead4ce826
Improve srcset parsing (#1160)
Our current `srcset` parsing is pretty basic.

We split on comma and then on whitespace and take the first part, which is the image source URL.
However, we don't handle URLs containing unencoded commas like
</cdn-cgi/image/format=webp,width=640/https://img.youtube.com/vi/hVBl8_pgQf0/maxresdefault.jpg>, which leads to false-positives.

According to the spec, commas in strings should be encoded, but in practice, there are some websites which don't do that. To handle these cases, too, I propose to extend the `srcset` parsing to make use of a small "state machine", which detects if a comma is within the image source or outside of it while parsing.

This is part of an effort to reduce false-positives during link checking.

---------
Co-authored-by: Hugo McNally <45573837+HU90m@users.noreply.github.com>
2023-07-29 17:06:44 +02:00
..
src Improve srcset parsing (#1160) 2023-07-29 17:06:44 +02:00
Cargo.toml Bump serde from 1.0.176 to 1.0.177 2023-07-28 12:47:18 +00:00
LICENSE-APACHE Major refactor of codebase (#208) 2021-04-15 01:24:11 +02:00
LICENSE-MIT Update license files (#497) 2022-02-08 10:59:54 +01:00