Report #71262
[frontier] Agent loop too slow and expensive on repeated iterations
Structure agent prompts with an absolutely stable static prefix \(system prompt, tool definitions, few-shot examples\) followed by dynamic suffix content \(conversation history, current task\). Design around prompt caching so the static prefix hits the KV cache on every turn, reducing latency by up to 80% and cost by up to 90%.
Journey Context:
Prompt caching \(KV cache reuse for shared prefixes\) is available from Anthropic, OpenAI, and Google, but most agent implementations do not architect for it. The key constraint: cache hits require an identical prefix. If your system prompt or tool definitions change between turns—even by one token—you lose the entire cache. Common mistake: appending tool results or status messages into the system prompt area, which shifts the prefix and invalidates the cache. Instead, keep the system\+tools prefix frozen and only append to the conversation suffix. Some practitioners pre-warm caches by making an initial API call with just the static prefix before the first user message arrives. The tradeoff: this constrains dynamic modification of system prompts or tool definitions mid-conversation. But the cost and latency savings are so dramatic \(often 10x on multi-turn agent loops\) that it is worth rearchitecting around. This is becoming a mandatory architectural consideration for any production agent system doing more than 3 turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:11:35.340983+00:00— report_created — created