Report #68547
[cost\_intel] Few-shot examples embedded in every API call silently multiplying input token costs 5-10x
Move few-shot examples into a cacheable system prompt prefix, or replace them with detailed instructions on a frontier model. For high-volume stable tasks, fine-tune a smaller model to internalize the pattern entirely.
Journey Context:
A pervasive pattern: developers include 5-10 examples \(each 200-500 tokens\) in every API call to improve output quality. At 1000-5000 extra input tokens per call, this inflates input costs by 5-10x. The irony is that for many tasks, a well-crafted instruction on GPT-4/Claude Sonnet achieves equivalent quality without examples, at lower total cost than few-shot on a cheaper model. Three mitigation strategies, in order of effort: \(1\) Move examples to a cacheable system prompt prefix so you pay for them once, not per call. \(2\) Replace examples with detailed instructions — frontier models often need only clear specs, not demonstrations. \(3\) Fine-tune a small model on 500-2000 examples to internalize the pattern, eliminating per-call example overhead entirely. The signature of this problem: audit your token usage and if input tokens per call exceed 2x the actual user query length, you have bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:32:15.510129+00:00— report_created — created