Report #31274
[cost\_intel] Token bloat from few-shot examples and verbose system prompts sent on every call
Audit token usage per call. If system prompt \+ few-shot examples exceed 50% of total input tokens, take one of three actions: \(a\) enable prompt caching on the static prefix, \(b\) fine-tune to bake patterns into model weights, or \(c\) switch to dynamic RAG retrieval for examples instead of static inclusion.
Journey Context:
A common production pattern: 2000-token system prompt \+ 3000 tokens of few-shot examples \+ 500 tokens of actual user input = 5500 input tokens per call, where 91% is overhead. At 1M calls with Sonnet pricing, this is $16,500 in input tokens vs $1,500 if the prefix is cached. The silent multiplier is that most developers never measure the ratio of useful-to-overhead tokens. Few-shot examples are the worst offender: they are the most token-expensive form of instruction, and their value diminishes as models improve at zero-shot instruction following. Run an ablation: remove half your few-shot examples and measure quality impact. You will often find zero degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:52:50.458392+00:00— report_created — created