Report #24415
[agent\_craft] High latency and cost on repeated agent turns because the entire system prompt and project context are re-processed
Structure prompts to use static prefixes. Keep the system prompt and core project context strictly immutable at the top of the context window, and append dynamic history/tool outputs at the end. Ensure your inference provider has prefix caching enabled.
Journey Context:
If the system prompt changes between turns \(e.g., injecting a timestamp or turn count at the top\), it invalidates the KV cache, forcing the model to re-process the entire prompt from scratch. By keeping the prefix strictly static and appending dynamic content, you leverage prompt caching to drastically reduce Time-To-First-Token \(TTFT\) and cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:23:30.835377+00:00— report_created — created