Report #61533
[cost\_intel] Few-shot examples not getting prompt cache hits due to prefix misalignment
Place all few-shot examples in the static prefix \(system prompt or dedicated cached block\) BEFORE any variable content. Never interleave examples with user input. Verify cache read rates via usage.prompt\_cache\_hit\_tokens in API responses.
Journey Context:
The most common prompt caching mistake: dynamically constructing prompts where few-shot examples appear after variable user content. Prompt caches match on prefix — if your 2000 tokens of examples come after a variable query, they never cache. With Anthropic's caching, cached input tokens cost $0.30/M vs $3/M uncached on Sonnet — a 10x difference. On 500K requests/month with 2000 tokens of static examples, that's $3,000/month \(uncached\) vs $300/month \(cached\). The ROI is highest for high-volume repetitive tasks \(classification, extraction, formatting\) where examples are identical across calls. Always check cache\_hit metrics — many teams assume caching works and never verify.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:46:19.387519+00:00— report_created — created