Documentation for building binary embedding plugins, refs #264

2026-05-17 18:21:06 +00:00 · 2023-09-12 11:32:12 -07:00 · 2023-09-12 11:32:12 -07:00 · e6dac1a1bd
commit e6dac1a1bd
parent 4952a8d119
3 changed files with 22 additions and 0 deletions
--- a/docs/embeddings/python-api.md
+++ b/docs/embeddings/python-api.md
@ -16,6 +16,8 @@ If the embedding model can handle binary input, you can call `.embed()` with a b
 if embedding_model.supports_binary:
    vector = embedding_model.embed(open("my-image.jpg", "rb").read())
 ```
+The `embedding_model.supports_text` property indicates if the model supports text input.
+
 Many embeddings models are more efficient when you embed multiple strings or binary strings at once. To embed multiple strings at once, use the `.embed_multi()` method:
 ```python
 vectors = list(embedding_model.embed_multi(["my happy hound", "my dissatisfied cat"]))
--- a/docs/embeddings/storage.md
+++ b/docs/embeddings/storage.md
@ -18,3 +18,5 @@ def encode(values):
 def decode(binary):
    return struct.unpack("<" + "f" * (len(binary) // 4), binary)
 ```
+
+These functions are available as `llm.encode()` and `llm.decode()`.
--- a/docs/embeddings/writing-plugins.md
+++ b/docs/embeddings/writing-plugins.md
@ -46,3 +46,21 @@ Or via its registered alias like this:
 ```bash
 cat file.txt | llm embed -m all-MiniLM-L6-v2
 ```
+[llm-sentence-transformers](https://github.com/simonw/llm-sentence-transformers) is a complete example of a plugin that provides an embedding model.
+
+## Embedding binary content
+
+If your model can embed binary content, use the `supports_binary` property to indicate that:
+
+```python
+class ClipEmbeddingModel(llm.EmbeddingModel):
+    model_id = "clip"
+    supports_binary = True
+    supports_text= True
+```
+
+`supports_text` defaults to `True` and so is not necessary here. You can set it to `False` if your model only supports binary data.
+
+If your model accepts binary, your `.embed_batch()` model may be called with a list of Python bytestrings. These may be mixed with regular strings if the model accepts both types of input.
+
+[llm-clip](https://github.com/simonw/llm-clip) is an example of a model that can embed both binary and text content.