Add note about similarity function in "similar" command's doc (#774)

* note about similarity function in similar command doc
* Link to Wikipedia definition

---------

Co-authored-by: Simon Willison <swillison@gmail.com>
This commit is contained in:
Tomoko Uchida 2025-02-27 03:07:10 +09:00 committed by GitHub
parent 849c65fe9d
commit eda1f4f588
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 5 additions and 5 deletions

View file

@ -325,7 +325,7 @@ llm embed-multi photos \
(embeddings-cli-similar)=
## llm similar
The `llm similar` command searches a collection of embeddings for the items that are most similar to a given or item ID.
The `llm similar` command searches a collection of embeddings for the items that are most similar to a given or item ID, based on [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).
This currently uses a slow brute-force approach which does not scale well to large collections. See [issue 216](https://github.com/simonw/llm/issues/216) for plans to add a more scalable approach via vector indexes provided by plugins.
@ -419,4 +419,4 @@ llm collections delete collection-name
Pass `-d` to specify a different database file:
```bash
llm collections delete collection-name -d my-embeddings.db
```
```

View file

@ -77,7 +77,7 @@ Commands:
models Manage available models
openai Commands for working directly with the OpenAI API
plugins List installed plugins
similar Return top N similar IDs from a collection
similar Return top N similar IDs from a collection using cosine...
templates Manage stored prompt templates
uninstall Uninstall Python packages from the LLM environment
```
@ -591,7 +591,7 @@ Options:
```
Usage: llm similar [OPTIONS] COLLECTION [ID]
Return top N similar IDs from a collection
Return top N similar IDs from a collection using cosine similarity.
Example usage:

View file

@ -1820,7 +1820,7 @@ def embed_multi(
)
def similar(collection, id, input, content, binary, number, database):
"""
Return top N similar IDs from a collection
Return top N similar IDs from a collection using cosine similarity.
Example usage: