Renamed binary.md to storage.md and documented --binary embeddings, refs #264

This commit is contained in:
Simon Willison 2023-09-12 11:15:17 -07:00
parent eea7b4e0fb
commit 506de80f69
3 changed files with 29 additions and 3 deletions

View file

@ -31,7 +31,26 @@ The `llm embed` command returns a JSON array of floating point numbers directly
```
You can omit the `-m/--model` option if you set a {ref}`default embedding model <embeddings-cli-embed-models-default>`.
See {ref}`embeddings-binary` for options to get back embeddings in formats other than JSON.
LLM also offers a binary storage format for embeddings, described in {ref}`embeddings storage format <embeddings-storage>`.
You can output embeddings using that format as raw bytes using `--format blob`, or in hexadecimal using `--format hex`, or in Base64 using `--format base64`:
```bash
llm embed -c 'This is some content' -m ada-002 --format base64
```
This outputs:
```
8NGzPFtdgTqHcZw7aUT6u+++WrwwpZo8XbSxv...
```
Some models such as [llm-clip](https://github.com/simonw/llm-clip) can run against binary data. You can pass in binary data using the `-i` and `--binary` options:
```bash
llm embed --binary -m clip -i image.jpg
```
Or from standard input like this:
```bash
cat image.jpg | llm embed --binary -m clip -i -
```
(embeddings-collections)=
### Storing embeddings in SQLite
@ -292,6 +311,13 @@ llm embed-multi documentation \
```
If a file cannot be read it will be logged to standard error but the script will keep on running.
If you are embedding binary content such as images for use with CLIP, add the `--binary` option:
```
llm embed-multi photos \
-m clip \
--files photos/ '*.jpeg' --binary
```
(embeddings-cli-similar)=
## llm similar

View file

@ -22,5 +22,5 @@ maxdepth: 3
cli
python-api
writing-plugins
binary
storage
```

View file

@ -1,4 +1,4 @@
(embeddings-binary)=
(embeddings-storage)=
# Embedding storage format
The default output format of the `llm embed` command is a JSON array of floating point numbers.