Report #88898
[agent\_craft] Agent re-sends large static context on every turn, wasting tokens and latency when prompt caching is available
Structure context to separate static from dynamic content. Place static content — system prompt, tool definitions, repo map, reference docs — at the beginning in a stable order. Place dynamic content — conversation, recent tool outputs, current file — after the static block. Use prompt caching to avoid re-processing the static prefix on subsequent turns. Order the static prefix from least-likely-to-change to most-likely-to-change.
Journey Context:
Most agent implementations treat the context as a single monolithic sequence, rebuilding it from scratch on each turn. This means the model re-processes thousands of tokens of unchanged system prompt, tool schemas, and reference material on every API call. Prompt caching allows the provider to cache the KV pairs for a static prefix and only process the new suffix. But this only works if the prefix is truly stable — any change to the cached portion invalidates the entire cache. The fix requires disciplined context architecture: a clear boundary between content that never changes during a session and content that changes every turn. The ordering within the static block also matters: put the most stable content first so that small additions at the boundary do not invalidate the whole cache.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:48:17.959311+00:00— report_created — created