llm/docs/plugins/advanced-model-plugins.md

(advanced-model-plugins)=
# Advanced model plugins

The {ref}`model plugin tutorial <tutorial-model-plugin>` covers the basics of developing a plugin that adds support for a new model.

This document covers more advanced topics.

(advanced-model-plugins-attachments)=
## Attachments for multi-modal models

Models such as GPT-4o, Claude 3.5 Sonnet and Google's Gemini 1.5 are multi-modal: they accept input in the form of images and maybe even audio, video and other formats.

LLM calls these **attachments**. Models can specify the types of attachments they accept and then implement special code in the `.execute()` method to handle them.

### Specifying attachment types

A `Model` subclass can list the types of attachments it accepts by defining a `attachment_types` class attribute:

```python
class NewModel(llm.Model):
    model_id = "new-model"
    attachment_types = {
        "image/png",
        "image/jpeg",
        "image/webp",
        "image/gif",
    }
```
These content types are detected when an attachment is passed to LLM using `llm -a filename`, or can be specified by the user using the `--attachment-type filename image/png` option.

**Note:** *MP3 files will have their attachment type detected as `audio/mpeg`, not `audio/mp3`.

LLM will use the `attachment_types` attribute to validate that provided attachments should be accepted before passing them to the model.

### Handling attachments

The `prompt` object passed to the `execute()` method will have an `attachments` attribute containing a list of `Attachment` objects provided by the user.

An `Attachment` instance has the following properties:

- `url (str)`: The URL of the attachment, if it was provided as a URL
- `path (str)`: The resolved file path of the attachment, if it was provided as a file
- `type (str)`: The content type of the attachment, if it was provided
- `content (bytes)`: The binary content of the attachment, if it was provided

Generally only one of `url`, `path` or `content` will be set.

You should usually access the type and the content through one of these methods:

- `attachment.resolve_type() -> str`: Returns the `type` if it is available, otherwise attempts to guess the type by looking at the first few bytes of content
- `attachment.content_bytes() -> bytes`: Returns the binary content, which it may need to read from a file or fetch from a URL
- `attachment.base64_content() -> str`: Returns that content as a base64-encoded string

A `id()` method returns a database ID for this content, which is either a SHA256 hash of the binary content or, in the case of attachments hosted at an external URL, a hash of `{"url": url}` instead. This is an implementation detail which you should not need to access directly.

Note that it's possible for a prompt with an attachments to not include a text prompt at all, in which case `prompt.prompt` will be `None`.

Here's how the OpenAI plugin handles attachments, including the case where no `prompt.prompt` was provided:

```python
if not prompt.attachments:
    messages.append({"role": "user", "content": prompt.prompt})
else:
    attachment_message = []
    if prompt.prompt:
        attachment_message.append({"type": "text", "text": prompt.prompt})
    for attachment in prompt.attachments:
        attachment_message.append(_attachment(attachment))
    messages.append({"role": "user", "content": attachment_message})


# And the code for creating the attachment message
def _attachment(attachment):
    url = attachment.url
    base64_content = ""
    if not url or attachment.resolve_type().startswith("audio/"):
        base64_content = attachment.base64_content()
        url = f"data:{attachment.resolve_type()};base64,{base64_content}"
    if attachment.resolve_type().startswith("image/"):
        return {"type": "image_url", "image_url": {"url": url}}
    else:
        format_ = "wav" if attachment.resolve_type() == "audio/wave" else "mp3"
        return {
            "type": "input_audio",
            "input_audio": {
                "data": base64_content,
                "format": format_,
            },
        }
```
As you can see, it uses `attachment.url` if that is available and otherwise falls back to using the `base64_content()` method to embed the image directly in the JSON sent to the API. For the OpenAI API audio attachments are always included as base64-encoded strings.

### Attachments from previous conversations

Models that implement the ability to continue a conversation can reconstruct the previous message JSON using the `response.attachments` attribute.

Here's how the OpenAI plugin does that:

```python
for prev_response in conversation.responses:
    if prev_response.attachments:
        attachment_message = []
        if prev_response.prompt.prompt:
            attachment_message.append(
                {"type": "text", "text": prev_response.prompt.prompt}
            )
        for attachment in prev_response.attachments:
            attachment_message.append(_attachment(attachment))
        messages.append({"role": "user", "content": attachment_message})
    else:
        messages.append(
            {"role": "user", "content": prev_response.prompt.prompt}
        )
    messages.append({"role": "assistant", "content": prev_response.text()})
```
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00			`(advanced-model-plugins)=`
			`# Advanced model plugins`

			The {ref}`model plugin tutorial <tutorial-model-plugin>` covers the basics of developing a plugin that adds support for a new model.

			`This document covers more advanced topics.`

			`(advanced-model-plugins-attachments)=`
			`## Attachments for multi-modal models`

			`Models such as GPT-4o, Claude 3.5 Sonnet and Google's Gemini 1.5 are multi-modal: they accept input in the form of images and maybe even audio, video and other formats.`

			LLM calls these attachments. Models can specify the types of attachments they accept and then implement special code in the `.execute()` method to handle them.

			`### Specifying attachment types`

			A `Model` subclass can list the types of attachments it accepts by defining a `attachment_types` class attribute:

			```python
			`class NewModel(llm.Model):`
			`model_id = "new-model"`
			`attachment_types = {`
			`"image/png",`
			`"image/jpeg",`
			`"image/webp",`
			`"image/gif",`
			`}`
			```
			These content types are detected when an attachment is passed to LLM using `llm -a filename`, or can be specified by the user using the `--attachment-type filename image/png` option.

			Note: *MP3 files will have their attachment type detected as `audio/mpeg`, not `audio/mp3`.

			LLM will use the `attachment_types` attribute to validate that provided attachments should be accepted before passing them to the model.

			`### Handling attachments`

			The `prompt` object passed to the `execute()` method will have an `attachments` attribute containing a list of `Attachment` objects provided by the user.

			An `Attachment` instance has the following properties:

			- `url (str)`: The URL of the attachment, if it was provided as a URL
			- `path (str)`: The resolved file path of the attachment, if it was provided as a file
			- `type (str)`: The content type of the attachment, if it was provided
			- `content (bytes)`: The binary content of the attachment, if it was provided

			Generally only one of `url`, `path` or `content` will be set.

			`You should usually access the type and the content through one of these methods:`

			- `attachment.resolve_type() -> str`: Returns the `type` if it is available, otherwise attempts to guess the type by looking at the first few bytes of content
			- `attachment.content_bytes() -> bytes`: Returns the binary content, which it may need to read from a file or fetch from a URL
			- `attachment.base64_content() -> str`: Returns that content as a base64-encoded string

			A `id()` method returns a database ID for this content, which is either a SHA256 hash of the binary content or, in the case of attachments hosted at an external URL, a hash of `{"url": url}` instead. This is an implementation detail which you should not need to access directly.

Support attachments without prompts, closes #611 2024-11-06 05:27:18 +00:00			Note that it's possible for a prompt with an attachments to not include a text prompt at all, in which case `prompt.prompt` will be `None`.

			Here's how the OpenAI plugin handles attachments, including the case where no `prompt.prompt` was provided:
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00
			```python
			`if not prompt.attachments:`
			`messages.append({"role": "user", "content": prompt.prompt})`
			`else:`
Support attachments without prompts, closes #611 2024-11-06 05:27:18 +00:00			`attachment_message = []`
			`if prompt.prompt:`
			`attachment_message.append({"type": "text", "text": prompt.prompt})`
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00			`for attachment in prompt.attachments:`
Support attachments without prompts, closes #611 2024-11-06 05:27:18 +00:00			`attachment_message.append(_attachment(attachment))`
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00			`messages.append({"role": "user", "content": attachment_message})`
Support attachments without prompts, closes #611 2024-11-06 05:27:18 +00:00

			`# And the code for creating the attachment message`
			`def _attachment(attachment):`
			`url = attachment.url`
			`base64_content = ""`
			`if not url or attachment.resolve_type().startswith("audio/"):`
			`base64_content = attachment.base64_content()`
			`url = f"data:{attachment.resolve_type()};base64,{base64_content}"`
			`if attachment.resolve_type().startswith("image/"):`
			`return {"type": "image_url", "image_url": {"url": url}}`
			`else:`
			`format_ = "wav" if attachment.resolve_type() == "audio/wave" else "mp3"`
			`return {`
			`"type": "input_audio",`
			`"input_audio": {`
			`"data": base64_content,`
			`"format": format_,`
			`},`
			`}`
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00			```
Support attachments without prompts, closes #611 2024-11-06 05:27:18 +00:00			As you can see, it uses `attachment.url` if that is available and otherwise falls back to using the `base64_content()` method to embed the image directly in the JSON sent to the API. For the OpenAI API audio attachments are always included as base64-encoded strings.
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00
			`### Attachments from previous conversations`

			Models that implement the ability to continue a conversation can reconstruct the previous message JSON using the `response.attachments` attribute.

			`Here's how the OpenAI plugin does that:`

			```python
			`for prev_response in conversation.responses:`
			`if prev_response.attachments:`
Support attachments without prompts, closes #611 2024-11-06 05:27:18 +00:00			`attachment_message = []`
			`if prev_response.prompt.prompt:`
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00			`attachment_message.append(`
Support attachments without prompts, closes #611 2024-11-06 05:27:18 +00:00			`{"type": "text", "text": prev_response.prompt.prompt}`
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00			`)`
Support attachments without prompts, closes #611 2024-11-06 05:27:18 +00:00			`for attachment in prev_response.attachments:`
			`attachment_message.append(_attachment(attachment))`
Docs for writing models that accept attachments, refs #587 2024-10-28 20:46:06 +00:00			`messages.append({"role": "user", "content": attachment_message})`
			`else:`
			`messages.append(`
			`{"role": "user", "content": prev_response.prompt.prompt}`
			`)`
			`messages.append({"role": "assistant", "content": prev_response.text()})`
			```