Report #25324
[agent\_craft] Agent experiences high latency and token cost on multi-turn tasks due to reprocessing static context
Structure prompts to place static, unchanging context \(system instructions, retrieved documentation\) at the beginning of the prompt, and dynamic context \(chat history, tool outputs\) at the end, leveraging prompt caching.
Journey Context:
Naively building the prompt array by just appending messages means the prefix changes slightly every time, invalidating the KV cache. By strictly ordering messages—static prefix first, dynamic suffix last—the LLM provider can reuse the cached key-value states for the prefix. This drastically reduces latency and cost for long-running agent loops that repeatedly reference the same large codebase or documentation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:54:43.679580+00:00— report_created — created