Report #84167
[cost\_intel] Passing large static few-shot examples on every API call without prompt caching
Structure prompts with static few-shots at the beginning, use Anthropic/OpenAI caching headers. ROI is massive for >5 calls on the same prefix.
Journey Context:
Token bloat from few-shots silently 10x's the cost. Caching reduces input token cost by 90% after the first call. If your few-shot prefix is 10k tokens, without caching you pay $3/M \* 10k \* N calls. With caching, you pay once, then $0.30/M for the cached prefix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:51:57.036267+00:00— report_created — created