From 55797071b0f1a79fe6afa58e8d91ef2c30c9192d Mon Sep 17 00:00:00 2001 From: Matthias Endler Date: Sat, 11 Mar 2023 15:18:25 +0100 Subject: [PATCH] Fix nested URL extraction in verbatim elements (#988) Skipping URLs in verbatim elements didn't take nested elements into consideration, which were not verbatim. For instance, the following HTML snippet would yield `https://example.com` in non-verbatim mode, even if it is nested inside a verbatim `
` element:

```html
link
``` This commit fixes the behavior for both `html5gum` and `html5ever`. Note that nested verbatim elements of the same kind still are not handled correctly. For instance, the following HTML snippet would still yield `https://example.com`: ```html
  

  link
``` The reason is that we currently only keep track of a single verbatim element and not a stack of elements, which we would need to unwind and resolve the situation. Fixes https://github.com/lycheeverse/lychee/issues/986. --- .github/workflows/ci.yml | 8 ++----- Makefile | 4 +++- fixtures/TEST_VERBATIM.html | 10 +++----- lychee-bin/tests/cli.rs | 13 ++++++++++ lychee-lib/src/extract/html5ever.rs | 37 +++++++++++++++++++++++++---- lychee-lib/src/extract/html5gum.rs | 29 +++++++++++++++++++++- 6 files changed, 81 insertions(+), 20 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index f80c4a7..c676264 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -25,12 +25,8 @@ jobs: toolchain: stable - uses: taiki-e/install-action@nextest - uses: Swatinem/rust-cache@v2 - - name: Run cargo test - run: cargo nextest run --all-targets --all-features --filter-expr '!test(test_exclude_example_domains)' - - name: Run cargo test (include example domains) - run: cargo nextest run --filter-expr 'test(test_exclude_example_domains)' - - name: Run doctests - run: cargo test --doc + - name: Run tests + run: make test lint: runs-on: ubuntu-latest diff --git a/Makefile b/Makefile index c084233..8287378 100644 --- a/Makefile +++ b/Makefile @@ -40,7 +40,9 @@ lint: ## Run linter .PHONY: test test: ## Run tests - cargo nextest run --all-targets + cargo nextest run --all-targets --all-features --filter-expr '!test(test_exclude_example_domains)' + cargo nextest run --filter-expr 'test(test_exclude_example_domains)' + cargo test --doc .PHONY: doc doc: ## Open documentation diff --git a/fixtures/TEST_VERBATIM.html b/fixtures/TEST_VERBATIM.html index 918a54f..eb550ea 100644 --- a/fixtures/TEST_VERBATIM.html +++ b/fixtures/TEST_VERBATIM.html @@ -1,5 +1,4 @@ - Verbatim HTML @@ -7,17 +6,14 @@

Verbatim HTML

Some verbatim HTML elements:

-
http://www.example.com/pre
- +
+      example
+    
http://www.example.com/code - http://www.example.com/samp - http://www.example.com/kbd - http://www.example.com/var -