llm/docs/related-tools.md

(related-tools)=
# Related tools

The following tools are designed to be used with LLM:

(related-tools-strip-tags)=
## strip-tags

[strip-tags](https://github.com/simonw/strip-tags) is a command for stripping tags from HTML. This is useful when working with LLMs because HTML tags can use up a lot of your token budget.

Here's how to summarize the front page of the New York Times, by both stripping tags and filtering to just the elements with `class="story-wrapper"`:

```bash
curl -s https://www.nytimes.com/ \
  | strip-tags .story-wrapper \
  | llm -s 'summarize the news'
```

[llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs](https://simonwillison.net/2023/May/18/cli-tools-for-llms/) describes ways to use `strip-tags` in more detail.

(related-tools-ttok)=
## ttok

[ttok](https://github.com/simonw/ttok) is a command-line tool for counting OpenAI tokens. You can use it to check if input is likely to fit in the token limit for GPT 3.5 or GPT4:

```bash
cat my-file.txt | ttok
```
```
125
```
It can also truncate input down to a desired number of tokens:
```bash
ttok This is too many tokens -t 3
```
```
This is too
```
This is useful for truncating a large document down to a size where it can be processed by an LLM.

(related-tools-symbex)=
## Symbex

[Symbex](https://github.com/simonw/symbex) is a tool for searching for symbols in Python codebases. It's useful for extracting just the code for a specific problem and then piping that into LLM for explanation, refactoring or other tasks.

Here's how to use it to find all functions that match `test*csv*` and use those to guess what the software under test does:

```bash
symbex 'test*csv*' | \
  llm --system 'based on these tests guess what this tool does'
```
It can also be used to export symbols in a format that can be piped to {ref}`llm embed-multi <embeddings-cli-embed-multi>` in order to create embeddings:
```bash
symbex '*' '*:*' --nl | \
  llm embed-multi symbols - \
  --format nl --database embeddings.db --store
```
For more examples see [Symbex: search Python code for functions and classes, then pipe them into a LLM](https://simonwillison.net/2023/Jun/18/symbex/).
Release 0.8 Refs #111, #147, #153, #154, #155, #158, #160, #167, #173, #174 2023-08-21 06:53:09 +00:00			`(related-tools)=`
Related tools documentation, closes #111 2023-08-21 05:34:29 +00:00			`# Related tools`

			`The following tools are designed to be used with LLM:`

Reference IDs !stable-docs 2023-09-05 21:22:33 +00:00			`(related-tools-strip-tags)=`
Related tools documentation, closes #111 2023-08-21 05:34:29 +00:00			`## strip-tags`

			`[strip-tags](https://github.com/simonw/strip-tags) is a command for stripping tags from HTML. This is useful when working with LLMs because HTML tags can use up a lot of your token budget.`

			Here's how to summarize the front page of the New York Times, by both stripping tags and filtering to just the elements with `class="story-wrapper"`:

			```bash
			`curl -s https://www.nytimes.com/ \`
			`\| strip-tags .story-wrapper \`
			`\| llm -s 'summarize the news'`
			```

			[llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs](https://simonwillison.net/2023/May/18/cli-tools-for-llms/) describes ways to use `strip-tags` in more detail.

Reference IDs !stable-docs 2023-09-05 21:22:33 +00:00			`(related-tools-ttok)=`
Related tools documentation, closes #111 2023-08-21 05:34:29 +00:00			`## ttok`

Fix link to ttok repo (#198) Thanks, @cmungall 2023-09-01 03:35:39 +00:00			`[ttok](https://github.com/simonw/ttok) is a command-line tool for counting OpenAI tokens. You can use it to check if input is likely to fit in the token limit for GPT 3.5 or GPT4:`
Related tools documentation, closes #111 2023-08-21 05:34:29 +00:00
			```bash
			`cat my-file.txt \| ttok`
			```
			```
			`125`
			```
			`It can also truncate input down to a desired number of tokens:`
			```bash
			`ttok This is too many tokens -t 3`
			```
			```
			`This is too`
			```
			`This is useful for truncating a large document down to a size where it can be processed by an LLM.`

Reference IDs !stable-docs 2023-09-05 21:22:33 +00:00			`(related-tools-symbex)=`
Update Symbex section 2023-09-05 21:21:24 +00:00			`## Symbex`
Related tools documentation, closes #111 2023-08-21 05:34:29 +00:00
Update Symbex section 2023-09-05 21:21:24 +00:00			`[Symbex](https://github.com/simonw/symbex) is a tool for searching for symbols in Python codebases. It's useful for extracting just the code for a specific problem and then piping that into LLM for explanation, refactoring or other tasks.`
Related tools documentation, closes #111 2023-08-21 05:34:29 +00:00
			Here's how to use it to find all functions that match `testcsv` and use those to guess what the software under test does:

			```bash
			`symbex 'testcsv' \| \`
			`llm --system 'based on these tests guess what this tool does'`
			```
Update Symbex section 2023-09-05 21:21:24 +00:00			It can also be used to export symbols in a format that can be piped to {ref}`llm embed-multi <embeddings-cli-embed-multi>` in order to create embeddings:
			```bash
			`symbex '' ':*' --nl \| \`
			`llm embed-multi symbols - \`
			`--format nl --database embeddings.db --store`
			```
			`For more examples see [Symbex: search Python code for functions and classes, then pipe them into a LLM](https://simonwillison.net/2023/Jun/18/symbex/).`