Getting started with ellmer

library(ellmer)

ellmer makes it easy to access the wealth of large language models (LLMs) from R. But what can you do with those models once you have access to them? This vignette will give you the basic vocabulary you need to use an LLM effectively and will show you some examples to ignite your creativity.

In this vignette we’ll mostly ignore how LLMs work, using them as convenient black boxes. If you want to get a sense of how they actually work, we recommend watching Jeremy Howard’s posit::conf(2023) keynote: A hacker’s guide to open source LLMs.

Vocabulary

We’ll start by laying out the key vocab that you’ll need to understand LLMs. Unfortunately the vocab is all a little entangled: to understand one term you’ll often have to know a little about some of the others. So we’ll start with some simple definitions of the most important terms then iteratively go a little deeper.

It all starts with a prompt, which is the text (typically a question or a request) that you send to the LLM. This starts a conversation, a sequence of turns that alternate between user prompts and model responses. Inside the model, both the prompt and response are represented by a sequence of tokens, which represent either individual words or subcomponents of a word. The tokens are used to compute the cost of using a model and to measure the size of the context, the combination of the current prompt and any previous prompts and responses used to generate the next response.

It’s useful to make the distinction between providers and models. A provider is a web API that gives access to one or more models. The distinction is a bit subtle because providers are often synonymous with a model, like OpenAI and GPT, Anthropic and Claude, and Google and Gemini. But other providers, like Ollama, can host many different models, typically open source models like LLaMa and Mistral. Still other providers support both open and closed models, typically by partnering with a company that provides a popular closed model. For example, Azure OpenAI offers both open source models and OpenAI’s GPT, while AWS Bedrock offers both open source models and Anthropic’s Claude.

What is a token?

An LLM is a model, and like all models needs some way to represent its inputs numerically. For LLMs, that means we need some way to convert words to numbers. This is the goal of the tokenizer. For example, using the GPT 4o tokenizer, the string “When was R created?” is converted to 5 tokens: 5958 (“When”), 673 (” was”), 460 (” R”), 5371 (” created”), 30 (“?”). As you can see, many simple strings can be represented by a single token. But more complex strings require multiple tokens. For example, the string “counterrevolutionary” requires 4 tokens: 32128 (“counter”), 264 (“re”), 9477 (“volution”), 815 (“ary”). (You can see how various strings are tokenized at http://tiktokenizer.vercel.app/).

It’s important to have a rough sense of how text is converted to tokens because tokens are used to determine the cost of a model and how much context can be used to predict the next response. On average an English word needs ~1.5 tokens so a page might require 375-400 tokens and a complete book might require 75,000 to 150,000 tokens. Other languages will typically require more tokens, because (in brief) LLMs are trained on data from the internet, which is primarily in English.

LLMs are priced per million tokens. State of the art models (like GPT-4o or Claude 3.5 sonnet) cost $2-3 per million input tokens, and $10-15 per million output tokens. Cheaper models can cost much less, e.g. GPT-4o mini costs $0.15 per million input tokens and $0.60 per million output tokens. Even $10 of API credit will give you a lot of room for experimentation, particularly with cheaper models, and prices are likely to decline as model performance improves.

Tokens also used to measure the context window, which is how much text the LLM can use to generate the next response. As we’ll discuss shortly, the context length includes the full state of your conversation so far (both your prompts and the model’s responses), which means that cost grow rapidly with the number of conversational turns.

In ellmer, you can see how many tokens a conversations has used by printing it, and you can see total usage for a session with token_usage().

chat <- chat_openai(model = "gpt-4o")
. <- chat$chat("Who created R?", echo = FALSE)
chat
#> <Chat turns=2 tokens=11/48>
#> ── user ────────────────────────────────────────────────────────────────────────
#> Who created R?
#> ── assistant ───────────────────────────────────────────────────────────────────
#> R was created by Ross Ihaka and Robert Gentleman at the University of Auckland,
#> New Zealand. The project began in the mid-1990s as an implementation of the S
#> programming language with a focus on statistical computing and data analysis.

token_usage()
#>                       name input output
#> 1 OpenAI-api.openai.com/v1    11     48

If you want to learn more about tokens and tokenizers, I’d recommend watching the first 20-30 minutes of Let’s build the GPT Tokenizer by Andrej Karpathy. You certainly don’t need to learn how to build your own tokenizer, but the intro will give you a bunch of useful background knowledge that will help improve your undersstanding of how LLM’s work.

What is a conversation?

A conversation with an LLM takes place through a series of HTTP requests and responses: you send your question to the LLM as an HTTP request, and it sends back its reply as an HTTP response. In other words, a conversation consists of a sequence of a paired turns: a sent prompt and a returned response.

It’s important to note that a request includes not only the current user prompt, but every previous user prompt and model response. This means that:

What is a prompt?

The user prompt is the question that you send to the model. There are two other important prompts that underlie the user prompt:

When you use a chat app like ChatGPT or claude.ai you can only iterate on the user prompt. But when you’re programming with LLMs, you’ll primarily iterate on the system prompt. For example, if you’re developing an app that helps a user write tidyverse code, you’d work with the system prompt to ensure that user gets the style of code they want.

Writing a good prompt, which is called prompt design, is key to effective use of LLMs. It is discussed in more detail in vignette("prompt-design").

Example uses

Now that you’ve got the basic vocab under your belt, I’m going to fire a bunch of interesting potential use cases at you. While there are special purpose tools that might solve these cases faster and/or cheaper, an LLM allows you to rapidly prototype a solution. This can be extremely valuable even if you end up using those more specialised tools in your final product.

In general, we recommend avoiding LLMs where accuracy is critical. That said, there are still many cases for their use. For example, even though they always requires some manual fiddling, you might save a bunch of time evern with an 80% correct solution. In fact, even a not-so-good solution can still be useful because it makes it easier to get started: it’s easier to react to something rather than to have to start from scratch with a blank page.

Chatbots

A great place to start with ellmer and LLMs is to build a chatbot with a custom prompt. Chatbots are a familiar interface to LLMs and are easy to create in R with shinychat. And there’s a surprising amount of value to creating a custom chatbot that has a prompt stuffed with useful knowledge. For example:

Another direction is to give the chatbot additional context about your current environment. For example, aidea allows the user to interactively explore a dataset with the help of the LLM. It adds summary statistics about the dataset to the prompt so that the LLM knows something about your data. Along these lines, imagine writing a chatbot to help with data import that has a prompt which include all the files in the current directory along with their first few lines.

Structured data extraction

LLMs are often very good at extracting structured data from unstructured text. This can give you traction to analyse data that was previously unaccessible. For example:

Structured data extraction also works well with images. It’s not the fastest or cheapest way to extract data but it makes it really easy to prototype ideas. For example, maybe you have a bunch of scanned documents that you want to index. You can convert PDFs to images (e.g. using {imagemagick}) then use structured data extraction to pull out key details.

Learn more about structured data extraction in vignette("structure-data").

Programming

LLMs can also be useful to solve general programming problems. For example:

Miscellaneous

To finish up here are a few other ideas that seem cool but didn’t seem to fit the above categories: