Report #39378
[cost\_intel] Enabling chain-of-thought on Claude 3.5 Sonnet for simple classification without output limits
Use standard non-reasoning models for straightforward tasks or set explicit thinking budgets \(Claude's 'thinking':\{'budget\_tokens':1024\}\); unconstrained CoT can generate 4k\+ tokens of reasoning for a binary classification, increasing cost 8x vs constrained output \($0.015 vs $0.003 per instance at 20k tokens\)
Journey Context:
Modern reasoning models \(o1, Claude 3.5 with thinking\) generate extensive internal monologues. For tasks where the answer is obvious \(binary classification, simple extraction\), this is pure waste. The cost model shifts: you're paying for reasoning tokens at input price rates \($15/$3 per 1M tokens for Claude 3 Opus/Sonnet\). Signature pattern: if your output tokens exceed input tokens by >3x on simple tasks, you have unconstrained reasoning bloat. People enable 'thinking' globally without realizing it adds 2-4k tokens per request. Quality signature: if you see elaborate reasoning followed by a simple yes/no, you're burning money.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:34:12.071438+00:00— report_created — created