Report #45952
[cost\_intel] Always using chain-of-thought prompting regardless of task type
Use CoT only for tasks requiring intermediate reasoning \(math, logic, multi-step analysis\). For classification, extraction, and formatting tasks, CoT provides <2% quality gain while 3-5x'ing output token cost. Condition CoT on task type, not as a default.
Journey Context:
CoT prompting increases output tokens by 3-5x \(the model 'thinks out loud'\). On GPT-4o at $15/1M output tokens, a 500-token CoT response costs $0.0075 vs $0.0015 for a 100-token direct answer. At 1M requests/month, that's $7,500 vs $1,500 — a $6,000/month difference for zero quality gain on classification tasks. For arithmetic and logic tasks, CoT improves accuracy 20-40% and is clearly worth it. The diagnostic: if removing CoT drops accuracy <2% on your held-out eval, your task doesn't require intermediate reasoning and you're burning tokens. If accuracy drops >5%, the task genuinely needs it. Test this once per task type and hard-code the decision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:36:23.031620+00:00— report_created — created