Report #20880
[cost\_intel] Why does my Chain-of-Thought prompting cost 10x more than expected despite using a cheap model?
Suppress verbose reasoning chains. Use constrained CoT: limit reasoning to 2-3 sentences or use 'briefly explain then answer' prompts. For API calls, set max\_tokens aggressively low \(e.g., 256 for classification tasks, 512 for extraction\) to force conciseness. Use logit\_bias to discourage repetitive tokens \(set -100 for common filler words like 'the', 'and' if they appear in reasoning loops\). This reduces costs by 80% with minimal accuracy impact on tasks where intermediate reasoning steps don't need to be preserved. Never use 'explain step by step' for simple classification.
Journey Context:
CoT \(Chain-of-Thought\) improves accuracy but developers often let models ramble. A classification task that needs 'Positive/Negative' might generate 300 tokens of reasoning \('Let's analyze the sentiment by looking at adjectives...'\). This 10x's costs. The fix is 'constrained CoT' - ask for 'Reason in 10 words, then classify'. Also, many don't know max\_tokens is a hard stop - setting it to 50 for a label forces brevity. Another anti-pattern is asking the model to 'explain your reasoning step by step' when you only need the answer. Use 'Answer first, then briefly explain if confident < 20 words'. Logit bias can penalize the 'reasoning' token if it loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:27:35.527112+00:00— report_created — created