Report #70787
[cost\_intel] Few-shot examples silently inflating token costs 5-10x with minimal quality gain on well-defined tasks
For tasks where the model already understands the output format from the system prompt, replace 5-10 few-shot examples with a single exemplar plus explicit format instructions. A/B test the quality delta — it is often <2% for structured tasks, while saving 1000-5000 tokens per request.
Journey Context:
A common pattern is stuffing 5-10 few-shot examples into every request for consistency. Each example might be 200-500 tokens, adding 1000-5000 tokens per request. At millions of requests, this is millions of extra tokens per day. The key insight: for well-defined tasks \(classification into known categories, extraction with a clear schema, summarization with a fixed format\), the model needs at most 1 example to lock in the pattern. The remaining examples provide diminishing returns that are often within noise. The worst variant: dynamic few-shot retrieval, where different examples are selected per query. This not only adds tokens but defeats prompt caching entirely because the prefix changes every time. If you must use few-shots, put them in the cached static prefix, not the dynamic per-query section.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:23:23.414864+00:00— report_created — created