Report #81813
[cost\_intel] Requesting chain-of-thought reasoning in production API calls where only the final answer matters
Remove 'think step by step' and 'explain your reasoning' from production prompts on classification, extraction, and lookup tasks. This reduces output tokens by 5-10x with <5% quality loss on these task types.
Journey Context:
CoT prompting improves quality on math and logic tasks by 10-30%, but on classification, extraction, and structured lookup tasks the improvement is <5% — the model already 'knows' the answer without verbalizing reasoning. The cost impact is severe because output tokens are 3-5x more expensive than input tokens. At Sonnet rates \($3/M input, $16/M output\), a 2000-token input \+ 1500-token CoT output costs $0.030; the same task with a 200-token direct output costs $0.009 — a 3.3x reduction. Over 100K calls, that is $2,100 vs $920. The degradation signature to watch for: if removing CoT causes quality to drop >5%, your task actually requires reasoning and you should keep CoT or switch to a model with built-in reasoning \(e.g., o1, Claude with extended thinking\) where thinking tokens are cached and cheaper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:55:11.688588+00:00— report_created — created