Show attachment types in llm models --options, closes #612

This commit is contained in:
Simon Willison 2024-11-05 22:49:26 -08:00
parent 1a60fa1667
commit 12df1a3b2a
6 changed files with 255 additions and 220 deletions

View file

@ -19,6 +19,8 @@ result = CliRunner().invoke(cli, ["aliases", "list"])
cog.out("```\n{}```".format(result.output))
]]] -->
```
4o : gpt-4o
4o-mini : gpt-4o-mini
3.5 : gpt-3.5-turbo
chatgpt : gpt-3.5-turbo
chatgpt-16k : gpt-3.5-turbo-16k
@ -29,8 +31,6 @@ gpt4 : gpt-4
gpt-4-turbo-preview : gpt-4-turbo
4-turbo : gpt-4-turbo
4t : gpt-4-turbo
4o : gpt-4o
4o-mini : gpt-4o-mini
3.5-instruct : gpt-3.5-turbo-instruct
chatgpt-instruct : gpt-3.5-turbo-instruct
ada : ada-002 (embedding)

View file

@ -31,6 +31,9 @@ models = [line for line in result.output.split("\n") if line.startswith("OpenAI
cog.out("```\n{}\n```".format("\n".join(models)))
]]] -->
```
OpenAI Chat: gpt-4o (aliases: 4o)
OpenAI Chat: gpt-4o-mini (aliases: 4o-mini)
OpenAI Chat: gpt-4o-audio-preview
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
@ -39,9 +42,6 @@ OpenAI Chat: gpt-4-1106-preview
OpenAI Chat: gpt-4-0125-preview
OpenAI Chat: gpt-4-turbo-2024-04-09
OpenAI Chat: gpt-4-turbo (aliases: gpt-4-turbo-preview, 4-turbo, 4t)
OpenAI Chat: gpt-4o (aliases: 4o)
OpenAI Chat: gpt-4o-mini (aliases: 4o-mini)
OpenAI Chat: gpt-4o-audio-preview
OpenAI Chat: o1-preview
OpenAI Chat: o1-mini
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)

View file

@ -247,186 +247,206 @@ result = CliRunner().invoke(cli, ["models", "list", "--options"])
cog.out("```\n{}\n```".format(result.output))
]]] -->
```
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
temperature: float
What sampling temperature to use, between 0 and 2. Higher values like
0.8 will make the output more random, while lower values like 0.2 will
make it more focused and deterministic.
max_tokens: int
Maximum number of tokens to generate.
top_p: float
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered. Recommended to use top_p or
temperature but not both.
frequency_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on their existing frequency in the text so far, decreasing the model's
likelihood to repeat the same line verbatim.
presence_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on whether they appear in the text so far, increasing the model's
likelihood to talk about new topics.
stop: str
A string where the API will stop generating further tokens.
logit_bias: dict, str
Modify the likelihood of specified tokens appearing in the completion.
Pass a JSON string like '{"1712":-100, "892":-100, "1489":-100}'
seed: int
Integer seed to attempt to sample deterministically
json_object: boolean
Output a valid JSON object {...}. Prompt must mention JSON.
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-1106-preview
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-0125-preview
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-turbo-2024-04-09
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-turbo (aliases: gpt-4-turbo-preview, 4-turbo, 4t)
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4o (aliases: 4o)
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
Options:
temperature: float
What sampling temperature to use, between 0 and 2. Higher values like
0.8 will make the output more random, while lower values like 0.2 will
make it more focused and deterministic.
max_tokens: int
Maximum number of tokens to generate.
top_p: float
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered. Recommended to use top_p or
temperature but not both.
frequency_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on their existing frequency in the text so far, decreasing the model's
likelihood to repeat the same line verbatim.
presence_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on whether they appear in the text so far, increasing the model's
likelihood to talk about new topics.
stop: str
A string where the API will stop generating further tokens.
logit_bias: dict, str
Modify the likelihood of specified tokens appearing in the completion.
Pass a JSON string like '{"1712":-100, "892":-100, "1489":-100}'
seed: int
Integer seed to attempt to sample deterministically
json_object: boolean
Output a valid JSON object {...}. Prompt must mention JSON.
Attachment types:
image/png, image/gif, image/webp, image/jpeg
OpenAI Chat: gpt-4o-mini (aliases: 4o-mini)
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
Attachment types:
image/png, image/gif, image/webp, image/jpeg
OpenAI Chat: gpt-4o-audio-preview
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
Attachment types:
audio/mpeg, audio/wave
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-1106-preview
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-0125-preview
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-turbo-2024-04-09
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: gpt-4-turbo (aliases: gpt-4-turbo-preview, 4-turbo, 4t)
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: o1-preview
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Chat: o1-mini
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
json_object: boolean
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
temperature: float
What sampling temperature to use, between 0 and 2. Higher values like
0.8 will make the output more random, while lower values like 0.2 will
make it more focused and deterministic.
max_tokens: int
Maximum number of tokens to generate.
top_p: float
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered. Recommended to use top_p or
temperature but not both.
frequency_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on their existing frequency in the text so far, decreasing the model's
likelihood to repeat the same line verbatim.
presence_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on whether they appear in the text so far, increasing the model's
likelihood to talk about new topics.
stop: str
A string where the API will stop generating further tokens.
logit_bias: dict, str
Modify the likelihood of specified tokens appearing in the completion.
Pass a JSON string like '{"1712":-100, "892":-100, "1489":-100}'
seed: int
Integer seed to attempt to sample deterministically
logprobs: int
Include the log probabilities of most likely N per token
Options:
temperature: float
What sampling temperature to use, between 0 and 2. Higher values like
0.8 will make the output more random, while lower values like 0.2 will
make it more focused and deterministic.
max_tokens: int
Maximum number of tokens to generate.
top_p: float
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered. Recommended to use top_p or
temperature but not both.
frequency_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on their existing frequency in the text so far, decreasing the model's
likelihood to repeat the same line verbatim.
presence_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on whether they appear in the text so far, increasing the model's
likelihood to talk about new topics.
stop: str
A string where the API will stop generating further tokens.
logit_bias: dict, str
Modify the likelihood of specified tokens appearing in the completion.
Pass a JSON string like '{"1712":-100, "892":-100, "1489":-100}'
seed: int
Integer seed to attempt to sample deterministically
logprobs: int
Include the log probabilities of most likely N per token
```
<!-- [[[end]]] -->

View file

@ -961,11 +961,11 @@ def models_list(options):
extra = ""
if model_with_aliases.aliases:
extra = " (aliases: {})".format(", ".join(model_with_aliases.aliases))
output = str(model_with_aliases.model) + extra
if options and model_with_aliases.model.Options.schema()["properties"]:
for name, field in model_with_aliases.model.Options.schema()[
"properties"
].items():
model = model_with_aliases.model
output = str(model) + extra
if options and model.Options.model_json_schema()["properties"]:
output += "\n Options:"
for name, field in model.Options.model_json_schema()["properties"].items():
any_of = field.get("anyOf")
if any_of is None:
any_of = [{"type": field["type"]}]
@ -976,17 +976,24 @@ def models_list(options):
if item["type"] != "null"
]
)
bits = ["\n ", name, ": ", types]
bits = ["\n ", name, ": ", types]
description = field.get("description", "")
if description and (
model_with_aliases.model.__class__
not in models_that_have_shown_options
model.__class__ not in models_that_have_shown_options
):
wrapped = textwrap.wrap(description, 70)
bits.append("\n ")
bits.extend("\n ".join(wrapped))
bits.append("\n ")
bits.extend("\n ".join(wrapped))
output += "".join(bits)
models_that_have_shown_options.add(model_with_aliases.model.__class__)
models_that_have_shown_options.add(model.__class__)
if options and model.attachment_types:
attachment_types = ", ".join(sorted(model.attachment_types))
wrapper = textwrap.TextWrapper(
width=min(max(shutil.get_terminal_size().columns, 30), 70),
initial_indent=" ",
subsequent_indent=" ",
)
output += "\n Attachment types:\n{}".format(wrapper.fill(attachment_types))
click.echo(output)

View file

@ -23,6 +23,11 @@ import yaml
@hookimpl
def register_models(register):
# GPT-4o
register(Chat("gpt-4o", vision=True), aliases=("4o",))
register(Chat("gpt-4o-mini", vision=True), aliases=("4o-mini",))
register(Chat("gpt-4o-audio-preview", audio=True))
# 3.5 and 4
register(Chat("gpt-3.5-turbo"), aliases=("3.5", "chatgpt"))
register(Chat("gpt-3.5-turbo-16k"), aliases=("chatgpt-16k", "3.5-16k"))
register(Chat("gpt-4"), aliases=("4", "gpt4"))
@ -32,10 +37,6 @@ def register_models(register):
register(Chat("gpt-4-0125-preview"))
register(Chat("gpt-4-turbo-2024-04-09"))
register(Chat("gpt-4-turbo"), aliases=("gpt-4-turbo-preview", "4-turbo", "4t"))
# GPT-4o
register(Chat("gpt-4o", vision=True), aliases=("4o",))
register(Chat("gpt-4o-mini", vision=True), aliases=("4o-mini",))
register(Chat("gpt-4o-audio-preview", audio=True))
# o1
register(Chat("o1-preview", can_stream=False, allows_system_prompt=False))
register(Chat("o1-mini", can_stream=False, allows_system_prompt=False))

View file

@ -514,32 +514,39 @@ def test_openai_localai_configuration(mocked_localai, user_path):
EXPECTED_OPTIONS = """
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
temperature: float
What sampling temperature to use, between 0 and 2. Higher values like
0.8 will make the output more random, while lower values like 0.2 will
make it more focused and deterministic.
max_tokens: int
Maximum number of tokens to generate.
top_p: float
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered. Recommended to use top_p or
temperature but not both.
frequency_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on their existing frequency in the text so far, decreasing the model's
likelihood to repeat the same line verbatim.
presence_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on whether they appear in the text so far, increasing the model's
likelihood to talk about new topics.
stop: str
A string where the API will stop generating further tokens.
logit_bias: dict, str
Modify the likelihood of specified tokens appearing in the completion.
Pass a JSON string like '{"1712":-100, "892":-100, "1489":-100}'
OpenAI Chat: gpt-4o (aliases: 4o)
Options:
temperature: float
What sampling temperature to use, between 0 and 2. Higher values like
0.8 will make the output more random, while lower values like 0.2 will
make it more focused and deterministic.
max_tokens: int
Maximum number of tokens to generate.
top_p: float
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered. Recommended to use top_p or
temperature but not both.
frequency_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on their existing frequency in the text so far, decreasing the model's
likelihood to repeat the same line verbatim.
presence_penalty: float
Number between -2.0 and 2.0. Positive values penalize new tokens based
on whether they appear in the text so far, increasing the model's
likelihood to talk about new topics.
stop: str
A string where the API will stop generating further tokens.
logit_bias: dict, str
Modify the likelihood of specified tokens appearing in the completion.
Pass a JSON string like '{"1712":-100, "892":-100, "1489":-100}'
seed: int
Integer seed to attempt to sample deterministically
json_object: boolean
Output a valid JSON object {...}. Prompt must mention JSON.
Attachment types:
image/gif, image/jpeg, image/png, image/webp
"""