Report #85059
[cost\_intel] Few-shot examples in prompts silently multiplying token costs by 5-10x at production scale
Audit per-request token counts. Each few-shot example is multiplied by request volume. For >1K requests/day: move examples into the prompt caching prefix, switch to zero-shot with precise format instructions, or fine-tune. A 5-example few-shot prompt at 400 tokens/example = 2000 extra input tokens per request.
Journey Context:
Few-shot examples are a development convenience that becomes a production cost disaster. The pattern: during prototyping, you add 3-5 examples to improve output quality. They work, so they ship to production. At 100K requests/day, those 2000 extra tokens per request = 200M extra input tokens/day. At Sonnet prices, that's $600/day just for the examples — often more than the actual task content. With prompt caching, placing examples in the cached prefix reduces this by ~90%. But the deeper fix is investing in a zero-shot prompt with explicit format specification, which typically recovers 80-90% of few-shot quality. The remaining gap is usually on edge cases that examples cover implicitly — document those edge cases as explicit instructions instead. Fine-tuning is the ultimate fix for truly high-volume tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:21:17.569625+00:00— report_created — created