Report #24242
[cost\_intel] Few-shot examples silently inflating token costs 10x
Place static examples in the cached system prompt prefix, fine-tune on examples instead of prompting with them for high-volume tasks, or use dynamic example selection \(embedding-based retrieval of 2-3 most relevant examples per request\) instead of including all examples every time.
Journey Context:
Including 5-10 examples in each request is the number one source of silent token bloat in production systems. At frontier model rates, 10 examples averaging 500 tokens equals 5000 input tokens per request. At 100K requests per day, that is $1500-5000 daily in example tokens alone. Three resolution paths with different tradeoffs: \(1\) Static examples in cached prefix—zero marginal cost after cache hit, but examples are the same for every request. \(2\) Fine-tuning—embeds example knowledge permanently, eliminates example tokens entirely, but requires training data and upfront cost. \(3\) Dynamic example selection—retrieves only relevant examples per request, cuts example tokens 60-80% while often improving quality by selecting more relevant exemplars. The worst pattern: different examples per request with no caching, which means you pay full price for example tokens every time with no amortization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:05:38.826650+00:00— report_created — created