Report #68364
[cost\_intel] Few-shot token bloat: padding prompts with 5-10 examples that improve quality 1-3% but multiply costs 5-10x
Benchmark quality with 0, 1, 2, and 3 examples then stop when the quality curve flattens. If you need more than 3 examples consistently, fine-tune instead because those examples belong in weights not in every request context window
Journey Context:
Common anti-pattern: developers add 10 few-shot examples averaging 200 tokens each equaling 2000 extra input tokens per request. At 1M requests that is 2B extra input tokens costing $10K on GPT-4o input alone. The marginal quality gain from example 3 to 10 is typically 1-3% on classification and 2-5% on generation tasks. Worse: long few-shot prefixes prevent prompt caching from working efficiently because the cacheable prefix changes whenever you update an example. The silent cost multiplier: those examples are paid for on EVERY single request, not just once. Fine-tuning pays the learning cost once during training and reduces per-request tokens by 80-90%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:14:05.505466+00:00— report_created — created