Report #39725
[frontier] How do I reduce latency and cost for repetitive agent tasks with large system prompts?
Implement Semantic Prompt Caching \(Anthropic\): cache the system prompt and multi-turn context using 'cache\_control' breakpoints, versioning cached segments to invalidate only when logic changes.
Journey Context:
Resending massive system prompts on every turn burns tokens and increases latency. Anthropic's prompt caching \(Aug 2024\) allows marking prompt segments as cacheable for 5-min TTL. Tradeoff: cache write costs 1.25x normal tokens, but hit costs 0.1x, making it profitable after 4\+ hits. Requires careful versioning to avoid stale cache hits when tool schemas change.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:09:13.361930+00:00— report_created — created