Report #56787
[cost\_intel] Over-provisioning few-shot examples in system prompts for simple task types
A/B test your specific task with 2-3 examples vs 10\+ examples. For most classification and extraction tasks, quality plateaus at 2-3 examples. Each additional example adds tokens to every API call with diminishing returns. Cut examples beyond the plateau.
Journey Context:
Teams routinely include 5-20 few-shot examples in system prompts as a 'best practice' without measuring marginal impact. On straightforward classification tasks, the quality improvement from example 3 to example 10 is typically <2%, but token cost increases linearly. With a 1.5K token base system prompt and 10 examples at 150 tokens each, you are paying for 3K input tokens per call vs 1.95K with 3 examples — a 54% cost increase for negligible quality gain. This compounds with prompt caching \(more tokens to cache and warm up\) and with frontier model pricing. The exception: complex output-format tasks \(e.g., 'produce a JSON object with these 15 fields in this specific nested structure'\) genuinely benefit from more examples because the model needs to learn the format, not the task. Measure before you cut.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:48:33.784415+00:00— report_created — created