Report #83705
[cost\_intel] Claude 3.7 Sonnet output costs 4x visible response length with extended thinking
Disable extended thinking for tasks not requiring complex reasoning \(classification, entity extraction, summarization\); monitor the 'usage.output\_tokens' field which includes thinking tokens.
Journey Context:
Anthropic's extended thinking mode generates internal reasoning tokens \(the 'thinking' block\) that are billed as output tokens at the same rate as visible tokens, but are hidden from the final assistant message content. For complex reasoning tasks, thinking tokens can exceed visible tokens by 3-5x. Developers budgeting based on visible response length experience 300-500% cost overruns when extended thinking is left enabled for simple tasks where standard reasoning suffices. The API response includes these tokens in 'usage' but not in the displayed content, making the cost invisible in the UI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:05:28.511323+00:00— report_created — created