Easy "Chat Costs" Derrivation
How to squeeze an entire conversation’s price into one compact formula.
Step 1 – The billable tokens
Every time you talk to a large-language-model API you pay for two buckets:
- Input tokens – everything you send (system prompt + all history + your newest message).
- Output tokens – what the model sends back.
Input tokens themselves split into two flavours:
- Input-miss – tokens the provider has to process for the first time.
- Input-hit – tokens already cached from earlier turns, charged at a discount.
We will label their prices
\( p_{im} \) : price per input-miss token
\( p_{ih} \) : price per input-hit token
\( p_o \) : price per output token
Step 2 – One round in plain English
For a single round you add
- \( I \) new input tokens (your message)
- \( O \) new output tokens (the reply)
So one round costs
\[ I \cdot p_{im} + O \cdot p_o \]
…provided nothing is cached yet.
Step 3 – Ten rounds? One hundred?
History grows. Each new user message is appended to all previous turns.
Let \( r \) be the number of user turns.
After \( r \) turns the model has seen
- Input-miss tokens: \( r \cdot I \) (only the fresh part of each prompt)
- Input-hit tokens: the ever-growing prefix.
That prefix is 0 tokens in round 1, \( (I+O) \) in round 2, \( 2(I+O) \) in round 3, …
Total hit tokens therefore form the triangular sum
\[ (I+O) \cdot [0 + 1 + 2 + \ldots + (r-1)] = (I+O) \cdot \frac{r(r-1)}{2} \]
- Output tokens: \( r \cdot O \) (one reply per turn)
Step 4 – The grand total
Add the three buckets:
\[
\begin{align}
TC &= r \cdot I \cdot p_{im} \\
&\quad + r \cdot O \cdot p_o \\
&\quad + (I+O) \cdot \frac{r(r-1)}{2} \cdot p_{ih}
\end{align}
\]
Factor out \( r \):
\[
TC = r \cdot \left[ I \cdot p_{im} + O \cdot p_o + \frac{r-1}{2} \cdot (I+O) \cdot p_{ih} \right]
\tag{1}
\]
That’s it: one line, three prices, two token counts.
Step 5 – A change of variables for extra insight
Define
\( M = I + O \) (tokens exchanged in one “turn”)
\( v = \frac{I}{M} \) (fraction of \( M \) that is user input)
Insert these into (1):
\[
TC = M \cdot r \cdot \left[ v \cdot p_{im} + (1-v) \cdot p_o + \frac{r-1}{2} \cdot p_{ih} \right]
\tag{2}
\]
Equation (2) cleanly separates
- the scale of the conversation (\( M \cdot r \) tokens)
- the shape of the conversation (\( v \) and \( r \))
- the prices (\( p_{im}, p_{ih}, p_o \))
Take-away
The key takeaway: input-hit costs scale with r2 (quadratic in the number of turns). If your
provider does not offer a good price for cached tokens, your total cost can explode as conversations get
longer. Always check the cache pricing—without a discount, prompt reuse quickly becomes expensive.