Report #96201
[cost\_intel] Why does my Gemini 2.0 Flash Thinking cost more than Flash standard despite lower per-token pricing?
Flash Thinking consumes 'reasoning tokens' charged at full output rate, typically 2-4x the visible output tokens; use Thinking mode only for math/code requiring >3 reasoning steps, and cap reasoning\_budget to 1024 tokens to avoid unbounded CoT inflation.
Journey Context:
Gemini 2.0 Flash Thinking generates hidden 'thoughts' before the final answer. These thoughts are billed as output tokens at the same rate as visible tokens. For a simple question, Thinking might generate 500 tokens of thought \+ 50 tokens of answer = 550 billed tokens. Standard Flash would generate just 50 tokens. At $0.60/1M tokens for Flash Thinking output vs $0.60/1M for standard \(actually pricing varies, but reasoning tokens are full price\), the cost is 11x higher for this example. Even with lower per-token input costs, the overhead dominates. Only use when the task requires explicit multi-step reasoning that the standard model fails at.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:03:35.148947+00:00— report_created — created