diff --git a/docs/embeddings/cli.md b/docs/embeddings/cli.md index 6a91ce9..2f4fc83 100644 --- a/docs/embeddings/cli.md +++ b/docs/embeddings/cli.md @@ -31,7 +31,26 @@ The `llm embed` command returns a JSON array of floating point numbers directly ``` You can omit the `-m/--model` option if you set a {ref}`default embedding model `. -See {ref}`embeddings-binary` for options to get back embeddings in formats other than JSON. +LLM also offers a binary storage format for embeddings, described in {ref}`embeddings storage format `. + +You can output embeddings using that format as raw bytes using `--format blob`, or in hexadecimal using `--format hex`, or in Base64 using `--format base64`: + +```bash +llm embed -c 'This is some content' -m ada-002 --format base64 +``` +This outputs: +``` +8NGzPFtdgTqHcZw7aUT6u+++WrwwpZo8XbSxv... +``` +Some models such as [llm-clip](https://github.com/simonw/llm-clip) can run against binary data. You can pass in binary data using the `-i` and `--binary` options: + +```bash +llm embed --binary -m clip -i image.jpg +``` +Or from standard input like this: +```bash +cat image.jpg | llm embed --binary -m clip -i - +``` (embeddings-collections)= ### Storing embeddings in SQLite @@ -292,6 +311,13 @@ llm embed-multi documentation \ ``` If a file cannot be read it will be logged to standard error but the script will keep on running. +If you are embedding binary content such as images for use with CLIP, add the `--binary` option: +``` +llm embed-multi photos \ + -m clip \ + --files photos/ '*.jpeg' --binary +``` + (embeddings-cli-similar)= ## llm similar diff --git a/docs/embeddings/index.md b/docs/embeddings/index.md index 1dde08c..b150cab 100644 --- a/docs/embeddings/index.md +++ b/docs/embeddings/index.md @@ -22,5 +22,5 @@ maxdepth: 3 cli python-api writing-plugins -binary +storage ``` diff --git a/docs/embeddings/binary.md b/docs/embeddings/storage.md similarity index 96% rename from docs/embeddings/binary.md rename to docs/embeddings/storage.md index c9cf290..d9a63be 100644 --- a/docs/embeddings/binary.md +++ b/docs/embeddings/storage.md @@ -1,4 +1,4 @@ -(embeddings-binary)= +(embeddings-storage)= # Embedding storage format The default output format of the `llm embed` command is a JSON array of floating point numbers.