Report #74101
[frontier] Long-running agent conversations hitting token limits or incurring high costs with repeated system prompts
Use prompt caching: mark system prompts and static background context as cacheable via the caching beta API to maintain KV cache warmth across conversation turns
Journey Context:
Without caching, each turn resends the entire prefix. Caching large static system prompts and document contexts allows the model to retain KV cache state, reducing latency by 50-90% and effectively extending the usable context window for multi-turn agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:58:35.669180+00:00— report_created — created