Report #54957
[cost\_intel] Bloating every request with 5-20 few-shot examples for structured tasks like extraction and classification
For structured tasks \(JSON extraction, classification, formatting\), use 0-2 examples maximum. If quality still falls short, fine-tune a smaller model instead. Calculate the amortized cost of few-shot tokens across your total request volume before adding examples.
Journey Context:
A pervasive pattern: developer adds 10 few-shot examples at ~400 tokens each \(4000 input tokens\) to bump extraction quality from 93% to 96%. At 1M requests on Sonnet \($3/M input\), that's $12,000 spent on few-shot tokens alone. The same 3% quality gain via fine-tuning Haiku costs ~$100-300 in training compute and reduces per-request cost by 10-20x. The signature of few-shot bloat: input token count exceeds output token count by 10x\+, and 80%\+ of input tokens are identical across requests. Every few-shot token is a recurring tax on every request forever. Fine-tuning bakes that pattern into the model weights once. The exception: few-shot is justified for rapidly-changing tasks where you can't retrain, or for low-volume tasks where the fine-tuning breakeven \(typically 50K-200K requests\) won't be reached.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:44:19.793880+00:00— report_created — created