Models that call out to API providers such as OpenAI, Anthropic or Google Gemini usually require an API key.
LLM's API key management mechanism {ref}`is described here <api-keys>`.
If your plugin requires an API key you should subclass the `llm.KeyModel` class instead of the `llm.Model` class. Start your model definition like this:
```python
import llm
class HostedModel(llm.KeyModel):
needs_key = "hosted" # Required
key_env_var = "HOSTED_API_KEY" # Optional
```
This tells LLM that your model requires an API key, which may be saved in the key registry under the key name `hosted` or might also be provided as the `HOSTED_API_KEY` environment variable.
Then when you define your `execute()` method it should take an extra `key=` parameter like this:
LLM will pass in the key from the environment variable, key registry or that has been passed to LLM as the `--key` command-line option or the `model.prompt(..., key=)` parameter.
Plugins can optionally provide an asynchronous version of their model, suitable for use with Python [asyncio](https://docs.python.org/3/library/asyncio.html). This is particularly useful for remote models accessible by an HTTP API.
The async version of a model subclasses `llm.AsyncModel` instead of `llm.Model`. It must implement an `async def execute()` async generator method instead of `def execute()`.
This example shows a subset of the OpenAI default plugin illustrating how this method might work:
If your model supports {ref}`structured output <schemas>` against a defined JSON schema you can implement support by first adding `supports_schema = True` to the class:
Check the [llm-gemini](https://github.com/simonw/llm-gemini) and [llm-anthropic](https://github.com/simonw/llm-anthropic) plugins for example of this pattern in action.
Models such as GPT-4o, Claude 3.5 Sonnet and Google's Gemini 1.5 are multi-modal: they accept input in the form of images and maybe even audio, video and other formats.
LLM calls these **attachments**. Models can specify the types of attachments they accept and then implement special code in the `.execute()` method to handle them.
A `Model` subclass can list the types of attachments it accepts by defining a `attachment_types` class attribute:
```python
class NewModel(llm.Model):
model_id = "new-model"
attachment_types = {
"image/png",
"image/jpeg",
"image/webp",
"image/gif",
}
```
These content types are detected when an attachment is passed to LLM using `llm -a filename`, or can be specified by the user using the `--attachment-type filename image/png` option.
LLM will use the `attachment_types` attribute to validate that provided attachments should be accepted before passing them to the model.
### Handling attachments
The `prompt` object passed to the `execute()` method will have an `attachments` attribute containing a list of `Attachment` objects provided by the user.
An `Attachment` instance has the following properties:
-`url (str)`: The URL of the attachment, if it was provided as a URL
-`path (str)`: The resolved file path of the attachment, if it was provided as a file
-`type (str)`: The content type of the attachment, if it was provided
-`content (bytes)`: The binary content of the attachment, if it was provided
Generally only one of `url`, `path` or `content` will be set.
You should usually access the type and the content through one of these methods:
-`attachment.resolve_type() -> str`: Returns the `type` if it is available, otherwise attempts to guess the type by looking at the first few bytes of content
-`attachment.content_bytes() -> bytes`: Returns the binary content, which it may need to read from a file or fetch from a URL
-`attachment.base64_content() -> str`: Returns that content as a base64-encoded string
A `id()` method returns a database ID for this content, which is either a SHA256 hash of the binary content or, in the case of attachments hosted at an external URL, a hash of `{"url": url}` instead. This is an implementation detail which you should not need to access directly.
As you can see, it uses `attachment.url` if that is available and otherwise falls back to using the `base64_content()` method to embed the image directly in the JSON sent to the API. For the OpenAI API audio attachments are always included as base64-encoded strings.
The `response.text_or_raise()` method used there will return the text from the response or raise a `ValueError` exception if the response is an `AsyncResponse` instance that has not yet been fully resolved.
This is a slightly weird hack to work around the common need to share logic for building up the `messages` list across both sync and async models.
Models that charge by the token should track the number of tokens used by each prompt. The ``response.set_usage()`` method can be used to record the number of tokens used by a response - these will then be made available through the Python API and logged to the SQLite database for command-line users.
`response` here is the response object that is passed to `.execute()` as an argument.
Call ``response.set_usage()`` at the end of your `.execute()` method. It accepts keyword arguments `input=`, `output=` and `details=` - all three are optional. `input` and `output` should be integers, and `details` should be a dictionary that provides additional information beyond the input and output token counts.
This example logs 15 input tokens, 340 output tokens and notes that 37 tokens were cached: