mirror of
https://github.com/Hopiu/llm.git
synced 2026-05-17 18:21:06 +00:00
Documentation for building binary embedding plugins, refs #264
This commit is contained in:
parent
4952a8d119
commit
e6dac1a1bd
3 changed files with 22 additions and 0 deletions
|
|
@ -16,6 +16,8 @@ If the embedding model can handle binary input, you can call `.embed()` with a b
|
|||
if embedding_model.supports_binary:
|
||||
vector = embedding_model.embed(open("my-image.jpg", "rb").read())
|
||||
```
|
||||
The `embedding_model.supports_text` property indicates if the model supports text input.
|
||||
|
||||
Many embeddings models are more efficient when you embed multiple strings or binary strings at once. To embed multiple strings at once, use the `.embed_multi()` method:
|
||||
```python
|
||||
vectors = list(embedding_model.embed_multi(["my happy hound", "my dissatisfied cat"]))
|
||||
|
|
|
|||
|
|
@ -18,3 +18,5 @@ def encode(values):
|
|||
def decode(binary):
|
||||
return struct.unpack("<" + "f" * (len(binary) // 4), binary)
|
||||
```
|
||||
|
||||
These functions are available as `llm.encode()` and `llm.decode()`.
|
||||
|
|
|
|||
|
|
@ -46,3 +46,21 @@ Or via its registered alias like this:
|
|||
```bash
|
||||
cat file.txt | llm embed -m all-MiniLM-L6-v2
|
||||
```
|
||||
[llm-sentence-transformers](https://github.com/simonw/llm-sentence-transformers) is a complete example of a plugin that provides an embedding model.
|
||||
|
||||
## Embedding binary content
|
||||
|
||||
If your model can embed binary content, use the `supports_binary` property to indicate that:
|
||||
|
||||
```python
|
||||
class ClipEmbeddingModel(llm.EmbeddingModel):
|
||||
model_id = "clip"
|
||||
supports_binary = True
|
||||
supports_text= True
|
||||
```
|
||||
|
||||
`supports_text` defaults to `True` and so is not necessary here. You can set it to `False` if your model only supports binary data.
|
||||
|
||||
If your model accepts binary, your `.embed_batch()` model may be called with a list of Python bytestrings. These may be mixed with regular strings if the model accepts both types of input.
|
||||
|
||||
[llm-clip](https://github.com/simonw/llm-clip) is an example of a model that can embed both binary and text content.
|
||||
Loading…
Reference in a new issue