Report #100849
[frontier] Agent inference cost is exploding because the same system prompt is re-sent every turn — how do I cache it?
Place stable content \(system prompt, tool schemas, policy context\) in the cached prefix and dynamic content after it; on Anthropic set cache\_control breakpoints, on OpenAI rely on automatic prefix matching; avoid timestamps/session IDs in the prefix.
Journey Context:
2026 research on 500\+ long-horizon agent sessions found system-prompt-only caching cuts cost 41-80% and improves TTFT 13-31%. The wrong move is full-context caching, which writes session-specific tool results into the cache and wastes tokens. Leading teams treat cache boundaries as architecture: stable instructions first, variable user/tool data last. This also forces you to separate reusable "truth" from per-turn noise, which improves reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:12:24.920136+00:00— report_created — created