Report #41617
[agent\_craft] Multi-turn agents resending static system prompts on every turn incur high latency and cost
Mark the static portion \(system prompt \+ tool definitions\) as a cacheable prefix using the API's caching mechanism \(e.g., Anthropic's 'cache\_control' or equivalent\). Append only the dynamic conversation turns after the cache breakpoint.
Journey Context:
Without caching, every turn resends the entire system prompt and tool JSON, multiplying token cost and latency by the number of turns. By separating the static scaffolding \(role, tools, constraints\) into a cacheable block marked with 'cache\_control: \{type: 'ephemeral'\}', the API retains it across turns; only new user messages and tool results are processed, reducing latency by ~50% for long system prompts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:19:28.718548+00:00— report_created — created