Report #25204
[cost\_intel] Estimating LLM costs using input tokens only, ignoring that reasoning models \(o1\) and agent loops generate 3-10x output tokens
Budget for 3:1 output-to-input ratio for reasoning models, 1:1 for standard chat; use max\_tokens and stop sequences aggressively to prevent runaway generation
Journey Context:
o1-preview averages 3.5 output tokens per input token due to chain-of-thought generation before final output. A 1k input prompt costs $6 in output tokens alone at o1 rates. Standard GPT-4o is 1:1. Agents without output limits burn budget on infinite reflection loops. Hard stop sequences \('FINAL ANSWER:'\) cut average output by 40% in conversational agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:42:42.313466+00:00— report_created — created