mirror of
https://github.com/Hopiu/llm.git
synced 2026-04-21 21:44:46 +00:00
* Embeddings plugin hook + OpenAI implementation * llm.get_embedding_model(name) function * llm embed command, for returning embeddings or saving them to SQLite * Tests using an EmbedDemo embedding model * llm embed-models list and emeb-models default commands * llm embed-db path and llm embed-db collections commands
97 lines
3.3 KiB
Markdown
97 lines
3.3 KiB
Markdown
(embeddings-cli)=
|
|
# Embedding with the CLI
|
|
|
|
LLM provides command-line utilities for calculating and storing embeddings for pieces of content.
|
|
|
|
(embeddings-llm-embed)=
|
|
## llm embed
|
|
|
|
The `llm embed` command can be used to calculate embedding vectors for a string of content. These can be returned directly to the terminal, stored in a SQLite database, or both.
|
|
|
|
### Returning embeddings to the terminal
|
|
|
|
The simplest way to use this command is to pass content to it using the `-c/--content` option, like this:
|
|
|
|
```bash
|
|
llm embed -c 'This is some content'
|
|
```
|
|
The command will return a JSON array of floating point numbers directly to the terminal:
|
|
|
|
```json
|
|
[0.123, 0.456, 0.789...]
|
|
```
|
|
By default it uses the {ref}`default embedding model <embeddings-cli-embed-models-default>`.
|
|
|
|
Use the `-m/--model` option to specify a different model:
|
|
|
|
```bash
|
|
llm -m sentence-transformers/all-MiniLM-L6-v2 \
|
|
-c 'This is some content'
|
|
```
|
|
See {ref}`embeddings-binary` for options to get back embeddings in formats other than JSON.
|
|
|
|
### Storing embeddings in SQLite
|
|
|
|
Embeddings are much more useful if you store them somewhere, so you can calculate similarity scores between different embeddings later on.
|
|
|
|
LLM includes a concept of a "collection" of embeddings. This is a named object where multiple pieces of content can be stored, each with a unique ID.
|
|
|
|
The `llm embed` command can store results directly in a named collection like this:
|
|
|
|
```bash
|
|
cat one.txt | llm embed my-files one
|
|
```
|
|
This will store the embedding for the contents of `one.txt` in the `my-files` collection under the key `one`.
|
|
|
|
A collection will be created the first time you mention it.
|
|
|
|
Collections have a fixed embedding model, which is the model that was used for the first embedding stored in that collection.
|
|
|
|
In the above example this would have been the default embedding model at the time that the command was run.
|
|
|
|
This example stores the embedding of the string "my happy hound" in a collection called `phrases` under the key `hound` and using the model `ada-002`:
|
|
|
|
```bash
|
|
llm embed -m ada-002 -c 'my happy hound' phrases hound
|
|
```
|
|
By default, the SQLite database used to store embeddings is the `embeddings.db` in the user content directory managed by LLM.
|
|
|
|
You can see the path to this directory by running `llm embed-db path`.
|
|
|
|
You can store embeddings in a different SQLite database by passing a path to it using the `-d/--database` option to `llm embed`. If this file does not exist yet the command will create it:
|
|
|
|
```bash
|
|
llm embed -d my-embeddings.db -c 'my happy hound' phrases hound
|
|
```
|
|
This creates a database file called `my-embeddings.db` in the current directory.
|
|
|
|
(embeddings-cli-embed-models-default)=
|
|
## llm embed-models default
|
|
|
|
This command can be used to get and set the default embedding model.
|
|
|
|
This will return the name of the current default model:
|
|
```bash
|
|
llm embed-models default
|
|
```
|
|
You can set a different default like this:
|
|
```
|
|
llm embed-models default name-of-other-model
|
|
```
|
|
Any of the supported aliases for a model can be passed to this command.
|
|
|
|
## llm embed-db collections
|
|
|
|
To list all of the collections in the embeddings database, run this command:
|
|
|
|
```bash
|
|
llm embed-db collections
|
|
```
|
|
Add `--json` for JSON output:
|
|
```bash
|
|
llm embed-db collections --json
|
|
```
|
|
Add `-d/--database` to specify a different database file:
|
|
```bash
|
|
llm embed-db collections -d my-embeddings.db
|
|
```
|