LLM API Cost Calculator


Easy "Chat Costs" Derrivation

How to squeeze an entire conversation’s price into one compact formula.


Step 1 – The billable tokens

Every time you talk to a large-language-model API you pay for two buckets:

Input tokens themselves split into two flavours:

We will label their prices

\( p_{im} \) : price per input-miss token
\( p_{ih} \) : price per input-hit token
\( p_o \) : price per output token


Step 2 – One round in plain English

For a single round you add

So one round costs

\[ I \cdot p_{im} + O \cdot p_o \]

…provided nothing is cached yet.


Step 3 – Ten rounds? One hundred?

History grows. Each new user message is appended to all previous turns.

Let \( r \) be the number of user turns.
After \( r \) turns the model has seen


Step 4 – The grand total

Add the three buckets:

\[ \begin{align} TC &= r \cdot I \cdot p_{im} \\ &\quad + r \cdot O \cdot p_o \\ &\quad + (I+O) \cdot \frac{r(r-1)}{2} \cdot p_{ih} \end{align} \]

Factor out \( r \):

\[ TC = r \cdot \left[ I \cdot p_{im} + O \cdot p_o + \frac{r-1}{2} \cdot (I+O) \cdot p_{ih} \right] \tag{1} \]

That’s it: one line, three prices, two token counts.


Step 5 – A change of variables for extra insight

Define

\( M = I + O \)   (tokens exchanged in one “turn”)
\( v = \frac{I}{M} \)   (fraction of \( M \) that is user input)

Insert these into (1):

\[ TC = M \cdot r \cdot \left[ v \cdot p_{im} + (1-v) \cdot p_o + \frac{r-1}{2} \cdot p_{ih} \right] \tag{2} \]

Equation (2) cleanly separates


Take-away

The key takeaway: input-hit costs scale with r2 (quadratic in the number of turns). If your provider does not offer a good price for cached tokens, your total cost can explode as conversations get longer. Always check the cache pricing—without a discount, prompt reuse quickly becomes expensive.