The {ref}`model plugin tutorial <tutorial-model-plugin>` covers the basics of developing a plugin that adds support for a new model. This document covers more advanced topics.
Features to consider for your model plugin include:
- {ref}`Accepting API keys <advanced-model-plugins-api-keys>` using the standard mechanism that incorporates `llm keys set`, environment variables and support for passing an explicit key to the model.
- Including support for {ref}`Async models <advanced-model-plugins-async>` that can be used with Python's `asyncio` library.
- Support for {ref}`structured output <advanced-model-plugins-schemas>` using JSON schemas.
- Handling {ref}`attachments <advanced-model-plugins-attachments>` (images, audio and more) for multi-modal models.
- Tracking {ref}`token usage <advanced-model-plugins-usage>` for models that charge by the token.
Models that call out to API providers such as OpenAI, Anthropic or Google Gemini usually require an API key.
LLM's API key management mechanism {ref}`is described here <api-keys>`.
If your plugin requires an API key you should subclass the `llm.KeyModel` class instead of the `llm.Model` class. Start your model definition like this:
```python
import llm
class HostedModel(llm.KeyModel):
needs_key = "hosted" # Required
key_env_var = "HOSTED_API_KEY" # Optional
```
This tells LLM that your model requires an API key, which may be saved in the key registry under the key name `hosted` or might also be provided as the `HOSTED_API_KEY` environment variable.
Then when you define your `execute()` method it should take an extra `key=` parameter like this:
LLM will pass in the key from the environment variable, key registry or that has been passed to LLM as the `--key` command-line option or the `model.prompt(..., key=)` parameter.
Plugins can optionally provide an asynchronous version of their model, suitable for use with Python [asyncio](https://docs.python.org/3/library/asyncio.html). This is particularly useful for remote models accessible by an HTTP API.
The async version of a model subclasses `llm.AsyncModel` instead of `llm.Model`. It must implement an `async def execute()` async generator method instead of `def execute()`.
This example shows a subset of the OpenAI default plugin illustrating how this method might work:
If your model supports {ref}`structured output <schemas>` against a defined JSON schema you can implement support by first adding `supports_schema = True` to the class:
Check the [llm-gemini](https://github.com/simonw/llm-gemini) and [llm-anthropic](https://github.com/simonw/llm-anthropic) plugins for example of this pattern in action.
Models such as GPT-4o, Claude 3.5 Sonnet and Google's Gemini 1.5 are multi-modal: they accept input in the form of images and maybe even audio, video and other formats.
LLM calls these **attachments**. Models can specify the types of attachments they accept and then implement special code in the `.execute()` method to handle them.
A `Model` subclass can list the types of attachments it accepts by defining a `attachment_types` class attribute:
```python
class NewModel(llm.Model):
model_id = "new-model"
attachment_types = {
"image/png",
"image/jpeg",
"image/webp",
"image/gif",
}
```
These content types are detected when an attachment is passed to LLM using `llm -a filename`, or can be specified by the user using the `--attachment-type filename image/png` option.
LLM will use the `attachment_types` attribute to validate that provided attachments should be accepted before passing them to the model.
### Handling attachments
The `prompt` object passed to the `execute()` method will have an `attachments` attribute containing a list of `Attachment` objects provided by the user.
An `Attachment` instance has the following properties:
-`url (str)`: The URL of the attachment, if it was provided as a URL
-`path (str)`: The resolved file path of the attachment, if it was provided as a file
-`type (str)`: The content type of the attachment, if it was provided
-`content (bytes)`: The binary content of the attachment, if it was provided
Generally only one of `url`, `path` or `content` will be set.
You should usually access the type and the content through one of these methods:
-`attachment.resolve_type() -> str`: Returns the `type` if it is available, otherwise attempts to guess the type by looking at the first few bytes of content
-`attachment.content_bytes() -> bytes`: Returns the binary content, which it may need to read from a file or fetch from a URL
-`attachment.base64_content() -> str`: Returns that content as a base64-encoded string
A `id()` method returns a database ID for this content, which is either a SHA256 hash of the binary content or, in the case of attachments hosted at an external URL, a hash of `{"url": url}` instead. This is an implementation detail which you should not need to access directly.
As you can see, it uses `attachment.url` if that is available and otherwise falls back to using the `base64_content()` method to embed the image directly in the JSON sent to the API. For the OpenAI API audio attachments are always included as base64-encoded strings.
The `response.text_or_raise()` method used there will return the text from the response or raise a `ValueError` exception if the response is an `AsyncResponse` instance that has not yet been fully resolved.
This is a slightly weird hack to work around the common need to share logic for building up the `messages` list across both sync and async models.
Models that charge by the token should track the number of tokens used by each prompt. The ``response.set_usage()`` method can be used to record the number of tokens used by a response - these will then be made available through the Python API and logged to the SQLite database for command-line users.
`response` here is the response object that is passed to `.execute()` as an argument.
Call ``response.set_usage()`` at the end of your `.execute()` method. It accepts keyword arguments `input=`, `output=` and `details=` - all three are optional. `input` and `output` should be integers, and `details` should be a dictionary that provides additional information beyond the input and output token counts.
This example logs 15 input tokens, 340 output tokens and notes that 37 tokens were cached: