Report #29815
[cost\_intel] Ignoring the 3-5x cost difference between input and output tokens when designing prompts
Minimize output tokens by asking for concise answers, specific formats \(like JSON\), or using stop sequences. Shift the burden of work to the input prompt where tokens are cheaper.
Journey Context:
Most API providers price output tokens 3-5x higher than input tokens \(to account for the compute difference in generation vs reading\). A common mistake is to write a short prompt and ask the model to explain your answer in detail, resulting in a massive output. By writing a detailed input prompt that constrains the output format \(e.g., Answer only with the JSON object, no other text\), you keep the expensive output tokens to an absolute minimum.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:26:05.446900+00:00— report_created — created