Why Your AI Bill is 5–10x Higher Than You Think

You type a short message — "fix the null check on line 47" — and hit Enter. How many tokens did that cost? Most people would guess maybe 15–20 tokens. The actual answer? Closer to 10,000–50,000 tokens, depending on how long your conversation is.

The reason is system context overhead — the invisible payload that gets sent along with every single message you write. Understanding this overhead is the key to understanding why AI costs what it does.

What actually gets sent when you send a message

When you type a message on Claude or ChatGPT, your text is just a tiny fraction of what gets transmitted to the API. The full payload includes several components that are assembled and sent automatically by the platform.

First, there's the system prompt — the instructions that tell the model how to behave. On Claude, this can be thousands of tokens covering personality guidelines, safety rules, tool definitions, and feature instructions. You never see this, but you pay for it on every message.

Second, the conversation history. AI models are stateless — they don't "remember" your earlier messages. Instead, the platform re-sends the entire conversation every time. Message 1 sends just your text. Message 10 sends all 10 messages. Message 50 sends all 50 messages. This is why costs accelerate as conversations get longer.

Third, tool definitions. If the model has access to tools (web search, code execution, file creation), their specifications are included in every request. These can add thousands of tokens of overhead before you've typed a single word.

Example: message #30 in a coding session

Your message47 tokens

System prompt3,200 tokens

Conversation history (29 prior messages)38,400 tokens

Tool definitions4,100 tokens

Total input tokens45,747 tokens

In this example, your actual message is 0.1% of the total input. The other 99.9% is overhead. And you're paying for all of it.

Why this matters for your wallet

On Claude Opus 4 at $15 per million input tokens, that single 47-token message actually costs about $0.0007 to send — which sounds trivial. But across a 50-message conversation, the cumulative overhead cost climbs to roughly $1.50–$3.00 in input tokens alone. Add output tokens at $75/million, and a single deep coding session can cost $5–$10 at the API level.

This is why your $20/month subscription feels limited. Anthropic and OpenAI are absorbing these costs when you use the web UI. The message limits you encounter on premium models exist precisely because each message costs far more than your visible prompt would suggest.

What you can do about it

Start fresh conversations more often. The single most effective way to reduce overhead is to start a new conversation when you change topics. A fresh conversation sends only the system prompt, not 40 messages of accumulated history.

Use lighter models for simple tasks. If you're just asking a quick formatting question, use a cheaper model. The overhead exists regardless of which model you use, but it costs 50x more on Opus than on Haiku.

Be aware of your conversation length. After 30+ messages, the overhead dominates your cost. If you're past message 50, you're paying premium prices mostly for the model to re-read your old messages — with diminishing returns on quality as the context window fills up.

Track your actual overhead. You can't optimize what you can't measure. This is exactly what Kontinuity's system overhead breakdown shows — the split between what you typed and what actually gets sent.

See the overhead for yourself

Kontinuity shows you exactly how much of your token spend is overhead vs your actual prompts.

Try Kontinuity — Free →

Why your AI bill is 5–10x higher than you think

What actually gets sent when you send a message

Why this matters for your wallet

What you can do about it

See the overhead for yourself