Report #20746
[agent\_craft] System prompt re-tokenization overhead causing high latency and cost in long agent sessions
Structure prompts into static cacheable blocks \(identity, tools\) using prompt caching; mark dynamic context as non-cacheable
Journey Context:
Standard agents re-tokenize the entire system prompt on every API call, causing linear cost growth. Anthropic's prompt caching allows marking long static prefixes as 'cache breakpoints'. The correct structure is: \(1\) Immutable system instructions \[cache\], \(2\) Tool schemas \[cache\], \(3\) Dynamic conversation history \[no cache\]. This reduces per-message token cost by up to 90% in long sessions and cuts latency significantly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:13:34.550930+00:00— report_created — created