LLM API Cost Calculator

Easy "Chat Costs" Derrivation

How to squeeze an entire conversation’s price into one compact formula.

Step 1 – The billable tokens

Every time you talk to a large-language-model API you pay for two buckets:

Input tokens – everything you send (system prompt + all history + your newest message).
Output tokens – what the model sends back.

Input tokens themselves split into two flavours:

Input-miss – tokens the provider has to process for the first time.
Input-hit – tokens already cached from earlier turns, charged at a discount.

We will label their prices

\( p_{im} \) : price per input-miss token
\( p_{ih} \) : price per input-hit token
\( p_o \) : price per output token

Step 2 – One round in plain English

For a single round you add

\( I \) new input tokens (your message)
\( O \) new output tokens (the reply)

So one round costs

\[ I \cdot p_{im} + O \cdot p_o \]

…provided nothing is cached yet.

Step 3 – Ten rounds? One hundred?

History grows. Each new user message is appended to all previous turns.

Let \( r \) be the number of user turns.
After \( r \) turns the model has seen

Input-miss tokens: \( r \cdot I \) (only the fresh part of each prompt)
Input-hit tokens: the ever-growing prefix.
That prefix is 0 tokens in round 1, \( (I+O) \) in round 2, \( 2(I+O) \) in round 3, …
Total hit tokens therefore form the triangular sum
\[ (I+O) \cdot [0 + 1 + 2 + \ldots + (r-1)] = (I+O) \cdot \frac{r(r-1)}{2} \]
Output tokens: \( r \cdot O \) (one reply per turn)

Step 4 – The grand total

Add the three buckets:

\[ \begin{align} TC &= r \cdot I \cdot p_{im} \\ &\quad + r \cdot O \cdot p_o \\ &\quad + (I+O) \cdot \frac{r(r-1)}{2} \cdot p_{ih} \end{align} \]

Factor out \( r \):

\[ TC = r \cdot \left[ I \cdot p_{im} + O \cdot p_o + \frac{r-1}{2} \cdot (I+O) \cdot p_{ih} \right] \tag{1} \]

That’s it: one line, three prices, two token counts.

Step 5 – A change of variables for extra insight

Define

\( M = I + O \) (tokens exchanged in one “turn”)
\( v = \frac{I}{M} \) (fraction of \( M \) that is user input)

Insert these into (1):

\[ TC = M \cdot r \cdot \left[ v \cdot p_{im} + (1-v) \cdot p_o + \frac{r-1}{2} \cdot p_{ih} \right] \tag{2} \]

Equation (2) cleanly separates

the scale of the conversation (\( M \cdot r \) tokens)
the shape of the conversation (\( v \) and \( r \))
the prices (\( p_{im}, p_{ih}, p_o \))

Take-away

The key takeaway: input-hit costs scale with r² (quadratic in the number of turns). If your provider does not offer a good price for cached tokens, your total cost can explode as conversations get longer. Always check the cache pricing—without a discount, prompt reuse quickly becomes expensive.