Report #5246
[agent\_craft] Agent interleaves static and dynamic content in prompts, breaking prompt caching and re-sending the same large context on every turn
Structure prompts as: \[static prefix: system instructions \+ project conventions \+ API docs\] then \[cache breakpoint\] then \[dynamic suffix: conversation \+ tool outputs \+ current reasoning\]. Never interleave dynamic content within the static prefix. This enables the provider to cache the prefix, reducing cost approximately 90% and latency approximately 80% for cached tokens.
Journey Context:
Prompt caching \(supported by Anthropic and OpenAI\) is not just a cost optimization—it is a context engineering technique. By structuring prompts so that the static prefix is cacheable, you effectively get a larger usable context window for the same cost and latency. The key constraint is that cached content must be a contiguous prefix—any dynamic content inserted in the middle breaks the cache boundary. This means your entire context architecture must be designed around this constraint from the start: all static context first, then all dynamic context. Agents that interleave system instructions with conversation history or tool outputs cannot benefit from caching. The restructuring is straightforward but must be intentional: collect all static context \(project docs, coding standards, API references\) into a single prefix block, set a cache breakpoint, then append all dynamic content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:54:39.729863+00:00— report_created — created