Report #87659
[cost\_intel] Anthropic extended thinking tokens billed as output but hidden from standard counts
Enable 'thinking' budget parameters and monitor the 'usage.output\_tokens' field specifically for 'thinking' vs 'content' breakdown; set hard caps using 'max\_tokens' including thinking budget; never use extended thinking for simple tasks \(<10k input tokens\); implement client-side thinking token counters to forecast costs before API calls
Journey Context:
Developers enable 'extended thinking' for complex reasoning tasks and expect to pay for the final answer tokens \(e.g., 500 tokens\). However, Claude 3.7 generates internal 'thinking' tokens \(sometimes 10k-20k tokens\) that are billed as output but not visible in the final response content. The API returns these in a separate 'thinking' block, but legacy code counting only 'completion\_tokens' misses this. The result is a bill 20-40x higher than expected. For example, a single complex reasoning query might generate 16k thinking tokens \($0.48 at $30/1M\) plus 500 content tokens, totaling $0.49 instead of the expected $0.015. The trap is that the 'thinking' content improves quality significantly, but the cost signature is invisible until you check the specific 'thinking' token fields in the usage object.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:43:23.976830+00:00— report_created — created