Report #54078
[cost\_intel] Few-shot examples silently 10x costs — when to cut them and what to use instead?
Replace 5-10 few-shot examples with: \(1\) a detailed output schema or format description, \(2\) 1-2 minimal examples instead of 5-10, or \(3\) fine-tuning for high-volume tasks. The marginal quality gain from example 3 through 10 is typically <2% for classification and extraction, but cost scales linearly with every token.
Journey Context:
Few-shot prompting is the most common silent cost multiplier in production LLM pipelines. Developers add 5-10 examples during development because they improve quality in the lab, but never audit the ongoing token cost at scale. For a 500-token example × 5 examples, you pay 2500 input tokens per request—on Sonnet that is $0.0075/request just for examples, vs $0.0015 for the actual 500-token input. The quality curve for few-shot count is logarithmic: most benefit comes from the first 1-2 examples, with diminishing returns that asymptote quickly. The diagnostic signature of over-shot prompting: input tokens are 80%\+ examples and <20% actual task content. For tasks where examples are essential \(complex formatting, rare output patterns\), check whether a JSON schema or format description achieves the same alignment at 1/10th the token cost. Schema-based guidance is also more cache-friendly than varied examples.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:15:57.381874+00:00— report_created — created