Most developers don't think about token efficiency when prompting AI. But small changes to how you write prompts can dramatically reduce both input and output token counts — saving you money without sacrificing response quality. Here are ten techniques we've validated with real token measurements.

01
Be specific about output format
Instead of "explain how this works," try "explain in 3 sentences how this works." Models generate until they feel done — give them a target and they'll hit it efficiently.
02
Start fresh for new topics
Continuing a long conversation means every new message carries the entire history as input tokens. Starting a new conversation for a new topic resets that overhead to zero.
03
Use "respond with only code" for code tasks
By default, models wrap code in explanatory prose. Adding "respond with only the code, no explanation" can cut output tokens by 40–60% on coding tasks.
04
Provide examples instead of long descriptions
A single input/output example is often more token-efficient than a paragraph describing what you want. The model pattern-matches from examples faster than from instructions.
05
Front-load the important context
Put your most critical requirements at the beginning and end of your prompt. Research shows models pay most attention to these positions, especially in long contexts.
06
Ask for diffs instead of full files
When editing code, ask the model to show only the changed lines. Regenerating an entire 200-line file when only 3 lines changed wastes output tokens.
07
Skip the pleasantries
"Please could you kindly help me with..." costs tokens. "Fix the null check in handleSubmit" is clearer and cheaper. Models don't need social niceties to give good answers.
08
Use structured prompts for complex tasks
XML tags, numbered steps, or clear sections help the model parse your intent with fewer tokens. Ambiguous prose requires more words to convey the same information.
09
Know when to use a smaller model
Not every task needs the most powerful (and expensive) model. Boilerplate generation, formatting, and simple lookups work fine on smaller models at a fraction of the cost.
10
Monitor your overhead ratio
The ratio of system overhead to your actual message is the biggest hidden cost. If 95% of your input tokens are overhead, your prompt optimization is fighting the wrong battle — start a fresh conversation instead.

The bottom line: Token efficiency isn't about being stingy — it's about getting the same quality results while spending less. The biggest savings come from structural changes (shorter conversations, right-sized models) rather than squeezing individual prompts.

Measure before you optimize

See your real token usage and overhead ratios with Kontinuity. Free during beta.

Try Free for 14 Days →