From 6957e4ecbb4b455a2dba3643293e5847735013fa Mon Sep 17 00:00:00 2001 From: Simon Willison Date: Thu, 27 Feb 2025 11:08:39 -0800 Subject: [PATCH] Improvements to schemas.md refs #788 --- docs/schemas.md | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/docs/schemas.md b/docs/schemas.md index 3a4f00c..0fa9427 100644 --- a/docs/schemas.md +++ b/docs/schemas.md @@ -31,15 +31,18 @@ LLM provides several ways to use schemas: 3. Via templates that include schemas 4. Through the {ref}`Python API ` +(schemas-using-cli)= + ### Basic usage with the command line To get structured data from a language model you can provide a JSON schema directly using the `--schema` option: ```bash -curl https://www.nytimes.com/ | uvx strip-tags | llm --schema '{ +curl https://www.nytimes.com/ | uvx strip-tags | \ + llm --schema '{ "type": "object", "properties": { - "stories": { + "items": { "type": "array", "items": { "type": "object", @@ -50,18 +53,18 @@ curl https://www.nytimes.com/ | uvx strip-tags | llm --schema '{ "short_summary": { "type": "string" }, - "key_people": { + "key_points": { "type": "array", "items": { "type": "string" } } }, - "required": ["headline", "summary", "key_people"] + "required": ["headline", "short_summary", "key_points"] } } }, - "required": ["stories"] + "required": ["items"] }' | jq ``` This example uses [uvx](https://docs.astral.sh/uv/guides/tools/) to run [strip-tags](https://github.com/simonw/strip-tags) against the front page of the New York Times, runs GPT-4o mini with a schema to extract story headlines and summaries, then pipes the result through [jq](https://jqlang.org/) to format it. @@ -70,10 +73,19 @@ This will instruct the model to return an array of JSON objects with the specifi (schemas-dsl)= -### Alternative schema syntax +## Alternative schema syntax JSON schema's can be time-consuming to construct by hand. LLM also supports a concise alternative syntax for specifying a schema. +The New York Times example above can be condensed to this, though note that key points is now a string rather than an array of strings: + +```bash +curl https://www.nytimes.com/ | uvx strip-tags | \ + llm --schema-multi 'headline, short_summary, key_points' | jq +``` + +### How that syntax works + A simple schema for an object with two string properties called `name` and `bio` looks like this: name, bio @@ -94,6 +106,8 @@ If your schema is getting long you can switch from comma-separated to newline-se age int: their age bio: a short bio, no more than three sentences +### Using alternative schema syntax + This format is supported by the `--schema` option. The format will be detected any time you provide a string with at least one space that doesn't start with a `{` (indicating JSON): ```bash @@ -104,12 +118,7 @@ To return multiple items matching your schema, use the `--schema-multi` option. ```bash llm --schema-multi 'name,description,fave_toy' 'invent 3 dogs' ``` -Using this option a simpler version of the New York Times example above is the following: -```bash -curl https://www.nytimes.com/ | uvx strip-tags | llm --schema-multi 'headline, summary' | jq -``` - -The Python utility function `llm.schema_dsl(schema)` can be used to convert this syntax into the equivalent JSON schema dictionary. +The Python utility function `llm.schema_dsl(schema)` can be used to convert this syntax into the equivalent JSON schema dictionary when working with schemas {ref}`in the Python API