Report #86825
[cost\_intel] Including many few-shot examples in every API request without caching the prefix
Either cache few-shot examples as a stable prompt prefix \(Anthropic\) or reduce to 2-3 high-quality examples. For tasks requiring many examples, fine-tuning eliminates per-request example token cost entirely.
Journey Context:
A common pattern: 5-10 few-shot examples at 300-500 tokens each, sent with every request. That is 1500-5000 tokens of static content per request. At 1M requests on Sonnet \($3/M input\), few-shot examples alone cost $4,500-15,000. With Anthropic prompt caching on a stable prefix, this drops to ~$450-1,500 at the 90% read discount. Without caching, research consistently shows 2-3 well-chosen examples often match 10 examples in quality — the marginal value of each additional example diminishes rapidly after the first few. If you genuinely need many examples, fine-tuning absorbs that knowledge into model weights, eliminating the per-request token cost entirely. The transition point: if your few-shot prefix exceeds ~1000 tokens and you run more than 10K requests, either cache it or fine-tune.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:19:26.985275+00:00— report_created — created