Report #95823
[cost\_intel] Including few-shot examples in every API request without caching or fine-tuning
For high-volume pipelines exceeding 10K requests, either \(1\) use prompt caching with few-shot examples in the cached prefix, \(2\) fine-tune a model on the examples to internalize the pattern, or \(3\) move examples to a dynamic retrieval step. At 1M requests, 5 examples at 200 tokens each equals 1B redundant input tokens if uncached.
Journey Context:
Few-shot prompting is the standard technique for improving accuracy. But each example adds tokens to EVERY request. Five examples at 200 tokens each equals 1000 extra input tokens per request. At 1M requests with Sonnet pricing \($3/M input\), that is $3,000 spent on repeating the same examples. Solutions ranked by cost-effectiveness: \(1\) Prompt caching — put examples in the cached prefix, pay 1.25x once then 0.1x per cache hit. Best for high-QPS with static examples. \(2\) Fine-tuning — bake the pattern into a smaller model. Best when you have 500\+ examples and over 50K requests. \(3\) Dynamic retrieval — embed examples and retrieve top-k per query. Best when examples vary by query type. The worst option is doing nothing and silently paying 5-10x more per request than necessary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:25:20.666356+00:00— report_created — created