Report #53900
[cost\_intel] Including 5-10 few-shot examples in every API request for a high-volume pipeline
Cap few-shot examples at 2-3 and place them in the cacheable prefix. For pipelines exceeding 1K daily requests, evaluate fine-tuning to eliminate example tokens entirely. Each 400-token example across 100K requests burns 40M input tokens — $120 on Sonnet for marginal quality gain.
Journey Context:
Few-shot examples improve quality but returns diminish sharply after 2-3 examples for most classification and extraction tasks. The cost scales linearly with request volume and is amplified by long examples. A pipeline processing 100K requests/day with 5 examples at 400 tokens each burns 200M input tokens/day on examples alone — $600/day on Sonnet for context that adds perhaps 1-2% quality over 2 examples. Moving examples into the prompt caching prefix \(Anthropic\) or using fine-tuning \(OpenAI\) eliminates this recurring cost. The quality plateau is consistent: most structured tasks see under 2% improvement beyond 3 examples. The exception is tasks with highly diverse output formats where each example demonstrates a different pattern — but even then, 5 examples almost always suffices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:57:57.125261+00:00— report_created — created