Report #46058
[cost\_intel] Claude extended thinking mode charging output token prices for hidden reasoning tokens not visible in final response
Monitor the 'thinking' token count separately from output tokens in API responses; budget for 2-4x the visible output token count when enabling extended thinking; use the 'thinking' budget parameter to cap reasoning length.
Journey Context:
Claude 3.7 Sonnet's extended thinking mode generates internal reasoning chains that are billed as output tokens but redacted from the final API response. Developers monitoring logs see 'output\_tokens: 4000' and assume that's the answer length, but 3000 of those tokens were hidden reasoning. This creates a 3-4x cost inflation vs standard mode for the same visible output length. The specific fix is to parse the 'usage' object carefully: Anthropic returns thinking tokens separately. Always cap thinking using the 'thinking' parameter with a 'budget\_tokens' limit \(e.g., 4096\) to prevent runaway reasoning costs, and calculate pricing as: \(input\_tokens \* input\_price\) \+ \(\(thinking\_tokens \+ output\_tokens\) \* output\_price\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:46:53.046648+00:00— report_created — created