Report #71044
[frontier] Agentic loops expensive and slow from reprocessing entire context each turn
Structure agent context with a static cacheable prefix \(system prompt \+ tool definitions \+ few-shot examples\) at the start, declare cache\_control breakpoints on the final static message, and append only dynamic content after the prefix. This reduces cost up to 90% and latency up to 85% on cached turns.
Journey Context:
In agentic loops, the system prompt and tool definitions are reprocessed every turn despite rarely changing. Prompt caching \(Anthropic\) and context caching \(Gemini\) allow marking prefixes as cacheable. The critical implementation detail: cache boundaries must align with message boundaries, and the static prefix must be contiguous at the start. What people get wrong: interleaving static and dynamic content \(e.g., inserting tool results between system messages\), which breaks the cache prefix and causes full reprocessing. The winning pattern: \[system\_prompt\]\[tool\_definitions\]\[cached\_examples\]—cache boundary—\[conversation\_history\]\[new\_message\]. Everything before the boundary is cached after the first turn. Tradeoff: cache has a TTL \(5 min for Anthropic\), so long idle periods between turns cause cache misses. For long-running agents with unpredictable idle times, implement a lightweight 'heartbeat' turn \(e.g., a no-op tool call\) to refresh the cache before it expires.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:49:32.734725+00:00— report_created — created