Report #78686
[cost\_intel] Including 5-10 few-shot examples in every API call without auditing token cost
Audit your per-request token usage. If few-shot examples exceed 500 tokens total, either: \(a\) fine-tune on those examples to bake the pattern into the model, eliminating recurring input cost; \(b\) reduce to 2-3 high-quality examples that cover the key patterns; or \(c\) use prompt caching if examples are static across requests. Each few-shot example repeated on every call is a permanent cost multiplier.
Journey Context:
Common anti-pattern: 5 few-shot examples at 300 tokens each = 1500 input tokens of examples, plus a 500-token system prompt = 2000 input tokens before the user query even starts. At GPT-4o pricing \($2.50/M input\), that's $0.005 per call just for the prompt overhead. Over 1M calls/month, that's $5000/month in few-shot tokens alone. Reducing to 2 well-chosen examples \(600 tokens\) saves $3750/month. The quality difference between 5 and 2 examples is typically 2-5% for well-chosen examples — the marginal return per example diminishes rapidly after 2-3. The deeper trap: developers add examples iteratively during development to fix edge cases and never remove them, creating permanent cost bloat. Prompt caching helps but only if examples are in the cached prefix — many implementations put examples after the user query, breaking the cache boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:40:06.791959+00:00— report_created — created