Report #30586
[frontier] Multi-turn agent conversations suffer from high latency and token costs due to reprocessing the system prompt and tool definitions every turn
Structure prompts with static prefixes \(system prompt, tool definitions\) and use prompt caching features to avoid reprocessing unchanged context.
Journey Context:
In an agent loop, the system prompt and tool definitions rarely change, but they are often the largest part of the context. Naively, the LLM re-reads and re-processes this every single turn, costing time and money. Prompt caching allows the API to cache the KV pairs of the static prefix. The tradeoff is that you must structure your prompt to have the static parts at the beginning, and you must use a provider that supports it. The payoff is up to 90% cost reduction and significantly lower Time-To-First-Token.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:43:23.389720+00:00— report_created — created