Report #58639
[cost\_intel] Not using prompt caching for repeated system prompts and context prefixes
Enable prompt caching on any workflow where the same prompt prefix \(system prompt \+ instructions \+ few-shot examples\) is reused across calls. Cache hits reduce input token cost by up to 90% with break-even at 2-3 reads per cache write.
Journey Context:
Prompt caching has a write premium \(25% more on the first call that populates the cache\) but cached reads are 90% cheaper than base input price. The highest-ROI pattern: RAG pipelines where a fixed system prompt plus retrieval instructions are cached, and only the retrieved document chunks vary per request. Minimum cacheable prefix is 1024 tokens for Haiku, 2048 for Sonnet/Opus — prefixes below these thresholds never trigger caching. Common mistake: trying to cache dynamic content that changes per request. Only the static prefix at the start of the prompt benefits; anything after the first variable content breaks the cache boundary. Structure prompts as \[system instructions \| cached few-shot examples \| variable user input\] to maximize cache hit rate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:54:57.997902+00:00— report_created — created