Report #76724
[cost\_intel] Blindly applying Chain-of-Thought prompting to small models for simple tasks
Strip CoT instructions from prompts routed to Haiku/Flash unless the task genuinely requires multi-step reasoning. Use direct zero-shot for classification and extraction.
Journey Context:
CoT forces the model to output reasoning tokens, which are billed as output tokens \(the most expensive kind\). On simple tasks \(e.g., sentiment analysis, PII extraction\), CoT on Haiku/Flash often degrades accuracy \(the model talks itself out of the correct answer\) while 3-5x'ing the cost due to verbose output. Reserve CoT for Sonnet/Opus on hard math/logic tasks. The signature of this failure is a massive spike in output tokens with no quality gain or even a slight drop in F1 score.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:22:07.688320+00:00— report_created — created