mirror of
https://github.com/Hopiu/llm.git
synced 2026-05-02 19:04:52 +00:00
662 B
662 B
(embeddings-storage)=
Embedding storage format
The default output format of the llm embed command is a JSON array of floating point numbers.
LLM stores embeddings in space-efficient format: a little-endian binary sequences of 32-bit floating point numbers, each represented using 4 bytes.
These are stored in a BLOB column in a SQLite database.
The following Python functions can be used to convert between this format and an array of floating point numbers:
import struct
def encode(values):
return struct.pack("<" + "f" * len(values), *values)
def decode(binary):
return struct.unpack("<" + "f" * (len(binary) // 4), binary)