Report #57601
[frontier] Agent systems have unpredictable latency and cost because long system prompts and tool definitions are re-processed on every turn
Design your agent architecture around prompt caching as a first-class constraint: place all stable, reusable content \(system prompts, tool definitions, few-shot examples\) at the beginning of the context in a fixed order, and never modify it between turns. Inject all dynamic content \(user messages, tool results, variable context\) after the cached prefix. Use cache\_control markers to explicitly define cache boundaries. Audit your prompts for any dynamic content that might be accidentally placed in the cached prefix.
Journey Context:
Prompt caching can reduce cost and latency by up to 90% for the cached portion — but only if the prefix is byte-identical across turns. Most developers don't realize that even trivial changes to the system prompt \(adding a timestamp, injecting a dynamic user name, reordering tool definitions\) invalidate the entire cache. The emerging architectural pattern is to treat the cached prefix as immutable infrastructure: all stable content goes first in a fixed order, and dynamic content is appended after a clear cache boundary. This means rethinking how you structure agent prompts: instead of a system prompt that includes dynamic context \(like 'The current user is X'\), use a static system prompt followed by dynamic context injected after the cache breakpoint. This is not just an optimization — it's a genuine architectural constraint that affects tool definition ordering, system prompt design, and context injection strategy. Teams that architect for caching from the start see dramatically lower costs and more consistent latency; teams that add caching as an afterthought often find that their prompt structure requires a complete rewrite to be cacheable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:10:12.752104+00:00— report_created — created