Improvements to schemas.md refs #788

This commit is contained in:
Simon Willison 2025-02-27 11:08:39 -08:00
parent 259366a575
commit 6957e4ecbb

View file

@ -31,15 +31,18 @@ LLM provides several ways to use schemas:
3. Via templates that include schemas
4. Through the {ref}`Python API <python-api-schemas>`
(schemas-using-cli)=
### Basic usage with the command line
To get structured data from a language model you can provide a JSON schema directly using the `--schema` option:
```bash
curl https://www.nytimes.com/ | uvx strip-tags | llm --schema '{
curl https://www.nytimes.com/ | uvx strip-tags | \
llm --schema '{
"type": "object",
"properties": {
"stories": {
"items": {
"type": "array",
"items": {
"type": "object",
@ -50,18 +53,18 @@ curl https://www.nytimes.com/ | uvx strip-tags | llm --schema '{
"short_summary": {
"type": "string"
},
"key_people": {
"key_points": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["headline", "summary", "key_people"]
"required": ["headline", "short_summary", "key_points"]
}
}
},
"required": ["stories"]
"required": ["items"]
}' | jq
```
This example uses [uvx](https://docs.astral.sh/uv/guides/tools/) to run [strip-tags](https://github.com/simonw/strip-tags) against the front page of the New York Times, runs GPT-4o mini with a schema to extract story headlines and summaries, then pipes the result through [jq](https://jqlang.org/) to format it.
@ -70,10 +73,19 @@ This will instruct the model to return an array of JSON objects with the specifi
(schemas-dsl)=
### Alternative schema syntax
## Alternative schema syntax
JSON schema's can be time-consuming to construct by hand. LLM also supports a concise alternative syntax for specifying a schema.
The New York Times example above can be condensed to this, though note that key points is now a string rather than an array of strings:
```bash
curl https://www.nytimes.com/ | uvx strip-tags | \
llm --schema-multi 'headline, short_summary, key_points' | jq
```
### How that syntax works
A simple schema for an object with two string properties called `name` and `bio` looks like this:
name, bio
@ -94,6 +106,8 @@ If your schema is getting long you can switch from comma-separated to newline-se
age int: their age
bio: a short bio, no more than three sentences
### Using alternative schema syntax
This format is supported by the `--schema` option. The format will be detected any time you provide a string with at least one space that doesn't start with a `{` (indicating JSON):
```bash
@ -104,12 +118,7 @@ To return multiple items matching your schema, use the `--schema-multi` option.
```bash
llm --schema-multi 'name,description,fave_toy' 'invent 3 dogs'
```
Using this option a simpler version of the New York Times example above is the following:
```bash
curl https://www.nytimes.com/ | uvx strip-tags | llm --schema-multi 'headline, summary' | jq
```
The Python utility function `llm.schema_dsl(schema)` can be used to convert this syntax into the equivalent JSON schema dictionary.
The Python utility function `llm.schema_dsl(schema)` can be used to convert this syntax into the equivalent JSON schema dictionary when working with schemas {ref}`in the Python API <python-api-schemas`.
You can experiment with the syntax using the `llm schemas dsl` command, which converts the input into a JSON schema:
```bash