Report #63716
[cost\_intel] Asking models to 'think step by step' on straightforward classification and extraction tasks
Remove chain-of-thought prompting from simple classification and extraction tasks. Use direct output \(just the label or JSON object\). Save 5-10x on output token costs with near-zero quality loss for tasks where the model already has high baseline confidence.
Journey Context:
CoT is genuinely valuable for math, logic, and multi-hop reasoning — it can improve accuracy by 10-30% on hard problems. But teams often apply it indiscriminately as a default. For sentiment classification, intent detection, or category tagging where the model's zero-shot accuracy is already >90%, CoT adds 200-500 output tokens per call with <1% accuracy improvement. At Sonnet output rates \($15/M tokens\), 500 extra output tokens per call = $0.0075/call in wasted output. At 1M calls/month, that's $7,500/month in unnecessary reasoning tokens. The diagnostic: if removing CoT from a classification prompt changes accuracy by <2%, you're burning money. The secondary cost: CoT output also increases latency, which matters for user-facing applications. Reserve CoT for tasks where it moves accuracy by >5% — typically reasoning, math, and multi-step inference, never for lookup or pattern-match tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:25:58.494276+00:00— report_created — created