Report #22545
[cost\_intel] Including many few-shot examples in production prompts for marginal quality gains
Cap few-shot examples at 2-3 per task type; quality plateaus but token cost scales linearly with example count. For high-volume pipelines, move few-shot examples into the cached prefix or replace with fine-tuning.
Journey Context:
The instinct is to add more examples to improve quality, but research and practice show diminishing returns after 2-3 examples for most tasks. Each example might be 200-500 tokens, so 10 examples = 2000-5000 tokens of bloat per request. At scale, this silently multiplies costs. Worse, long few-shot sections can distract the model from the actual query — the model starts pattern-matching against examples rather than reasoning about the input. The alternatives: \(1\) cache the few-shot prefix if it's static across requests, \(2\) fine-tune on examples if volume justifies it, \(3\) use retrieval to select the most relevant 1-2 examples dynamically instead of a fixed large set. Option 3 gives you the quality of targeted examples without the token bloat of a static block.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:15:04.541272+00:00— report_created — created