Mention brute-force approach, link to vector indexing issue

Refs #216. Closes #214
2026-05-03 19:34:44 +00:00 · 2023-09-03 19:10:42 -07:00 · 2023-09-03 19:10:42 -07:00 · f842fbea49
commit f842fbea49
parent 94f0a1a337
2 changed files with 5 additions and 1 deletions
--- a/docs/embeddings/cli.md
+++ b/docs/embeddings/cli.md
@ -285,6 +285,8 @@ llm-docs/plugins/index.md

 The `llm similar` command searches a collection of embeddings for the items that are most similar to a given or item ID.

+This currently uses a slow brute-force approach which does not scale well to large collections. See [issue 216](https://github.com/simonw/llm/issues/216) for plans to add a more scalable approach via vector indexes provided by plugins.
+
 To search the `quotations` collection for items that are semantically similar to `'computer science'`:

 ```bash
--- a/docs/embeddings/python-api.md
+++ b/docs/embeddings/python-api.md
@ -116,7 +116,9 @@ if Collection.exists(db, "entries"):
 (embeddings-python-similar)=
 ## Retrieving similar items

-Once you have populated a collection of embeddings you can retrieve the entries that are most similar to a given string using the `similar()` method:
+Once you have populated a collection of embeddings you can retrieve the entries that are most similar to a given string using the `similar()` method.
+
+This method uses a brute force approach, calculating distance scores against every document. This is fine for small collections, but will not scale to large collections. See [issue 216](https://github.com/simonw/llm/issues/216) for plans to add a more scalable approach via vector indexes provided by plugins.

 ```python
 for entry in collection.similar("hound"):