Report #53005
[cost\_intel] Including 5-10 few-shot examples in every request's system prompt, silently multiplying input token costs
Move static few-shot examples into the prompt-cached prefix layer so they're billed at the cached rate \(~90% discount\) after the first request. Alternatively, replace blanket few-shot with dynamic example retrieval \(RAG\) to include only 1-2 relevant examples per request, or fine-tune to internalize the examples entirely.
Journey Context:
Teams add few-shot examples to improve output quality, which works, but the examples are billed as full-price input tokens on every request. Five examples averaging 400 tokens each = 2000 tokens of overhead per request. At 10K requests/day on Sonnet \($3/M input\), that's $60/day or $21,900/year just for the few-shot block. With prompt caching, the same block costs ~$6/day after the first request — a $21,840/year saving from one configuration change. Without caching available, the next best option is dynamic retrieval: embed your example pool, retrieve the 1-2 most relevant per request, and cut the few-shot block from 2000 to 400-800 tokens. Fine-tuning is the ultimate fix but requires upfront investment and a stable task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:27:46.648085+00:00— report_created — created