llm/docs/embeddings/cli.md
Simon Willison 77cf56e54a
Initial CLI support and plugin hook for embeddings, refs #185
* Embeddings plugin hook + OpenAI implementation
* llm.get_embedding_model(name) function
* llm embed command, for returning embeddings or saving them to SQLite
* Tests using an EmbedDemo embedding model
* llm embed-models list and emeb-models default commands
* llm embed-db path and llm embed-db collections commands
2023-08-27 22:24:10 -07:00

3.3 KiB

(embeddings-cli)=

Embedding with the CLI

LLM provides command-line utilities for calculating and storing embeddings for pieces of content.

(embeddings-llm-embed)=

llm embed

The llm embed command can be used to calculate embedding vectors for a string of content. These can be returned directly to the terminal, stored in a SQLite database, or both.

Returning embeddings to the terminal

The simplest way to use this command is to pass content to it using the -c/--content option, like this:

llm embed -c 'This is some content'

The command will return a JSON array of floating point numbers directly to the terminal:

[0.123, 0.456, 0.789...]

By default it uses the {ref}default embedding model <embeddings-cli-embed-models-default>.

Use the -m/--model option to specify a different model:

llm -m sentence-transformers/all-MiniLM-L6-v2 \
  -c 'This is some content'

See {ref}embeddings-binary for options to get back embeddings in formats other than JSON.

Storing embeddings in SQLite

Embeddings are much more useful if you store them somewhere, so you can calculate similarity scores between different embeddings later on.

LLM includes a concept of a "collection" of embeddings. This is a named object where multiple pieces of content can be stored, each with a unique ID.

The llm embed command can store results directly in a named collection like this:

cat one.txt | llm embed my-files one

This will store the embedding for the contents of one.txt in the my-files collection under the key one.

A collection will be created the first time you mention it.

Collections have a fixed embedding model, which is the model that was used for the first embedding stored in that collection.

In the above example this would have been the default embedding model at the time that the command was run.

This example stores the embedding of the string "my happy hound" in a collection called phrases under the key hound and using the model ada-002:

llm embed -m ada-002 -c 'my happy hound' phrases hound

By default, the SQLite database used to store embeddings is the embeddings.db in the user content directory managed by LLM.

You can see the path to this directory by running llm embed-db path.

You can store embeddings in a different SQLite database by passing a path to it using the -d/--database option to llm embed. If this file does not exist yet the command will create it:

llm embed -d my-embeddings.db -c 'my happy hound' phrases hound

This creates a database file called my-embeddings.db in the current directory.

(embeddings-cli-embed-models-default)=

llm embed-models default

This command can be used to get and set the default embedding model.

This will return the name of the current default model:

llm embed-models default

You can set a different default like this:

llm embed-models default name-of-other-model

Any of the supported aliases for a model can be passed to this command.

llm embed-db collections

To list all of the collections in the embeddings database, run this command:

llm embed-db collections

Add --json for JSON output:

llm embed-db collections --json

Add -d/--database to specify a different database file:

llm embed-db collections -d my-embeddings.db