Report #99007
[frontier] How do I cut cost and latency in multi-turn agents without sacrificing context?
Cache only stable prefixes \(system prompt \+ tool definitions\) and place all dynamic content at the end. Explicit cache breakpoints around dynamic values can reduce API cost 45–80% and time-to-first-token 13–31% in long-horizon agents.
Journey Context:
Naive full-context caching often backfires: it caches tool results and conversation history that will not be reused, creating write overhead without read benefits. The 'Don't Break the Cache' evaluation found system-prompt-only caching the most consistent strategy across OpenAI, Anthropic, and Google. Dynamic values like timestamps, session IDs, or per-request user info must be moved to the end of the prompt or into separate uncached blocks. This also means keeping the tool set stable; dynamic MCP tool discovery can invalidate the cache.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:09:17.117561+00:00— report_created — created