lychee/fixtures/TEST_VERBATIM.html
Matthias Endler 55797071b0
Fix nested URL extraction in verbatim elements (#988)
Skipping URLs in verbatim elements didn't take nested
elements into consideration, which were not verbatim.

For instance, the following HTML snippet would yield
`https://example.com` in non-verbatim mode, even if
it is nested inside a verbatim `<pre>` element:

```html
<pre><a href="https://example.com">link</a></pre>
```

This commit fixes the behavior for both `html5gum` and
`html5ever`.

Note that nested verbatim elements of the same kind
still are not handled correctly.

For instance,  the following HTML snippet would still yield
`https://example.com`:

```html
<pre>
  <pre></pre>
  <a href="https://example.com">link</a>
</pre>
```

The reason is that we currently only keep track of a single
verbatim element and not a stack of elements, which we
would need to unwind and resolve the situation.

Fixes https://github.com/lycheeverse/lychee/issues/986.
2023-03-11 15:18:25 +01:00

22 lines
630 B
HTML
Vendored

<!-- Test URLs in verbatim HTML elements -->
<html>
<head>
<title>Verbatim HTML</title>
</head>
<body>
<h1>Verbatim HTML</h1>
<p>Some verbatim HTML elements:</p>
<pre>http://www.example.com/pre</pre>
<pre>
<a href="http://www.example.com/pre/a" target="_blank" rel="noopener">example</a>
</pre>
<code>http://www.example.com/code</code>
<samp> http://www.example.com/samp </samp>
<kbd>http://www.example.com/kbd</kbd>
<var>http://www.example.com/var</var>
<script>
// http://www.example.com/script
"http://www.example.com/script";
</script>
</body>
</html>