Report #100352
[frontier] Prompt caching is treated as a cost optimization instead of an architectural primitive
Design agent prompts around cacheable prefixes: put static system instructions, tool definitions, and few-shot examples at the start of the context, and mutate only the trailing user state to maximize cache hits and slash latency.
Journey Context:
With modern APIs supporting prompt caching, the cost model changed: you pay once to cache a long prefix, then cheap per-request suffix tokens. Agents that prepend dynamic content before static instructions destroy cacheability. The right architecture is to treat the prompt as prefix\+suffix where the prefix is rebuilt only when tools or persona change. Teams that ignore this see 10x cost regressions on multi-turn agents compared to those that design for it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:05:04.589778+00:00— report_created — created