Report #100266
[agent\_craft] Repeatedly paying latency and cost for long static prefixes on every turn
Use prompt caching or prefix-aware serving to keep stable context hot. Put system instructions, project conventions, and files that change rarely in the cached prefix; keep user messages, new files, and current-turn instructions in the mutable suffix.
Journey Context:
Without caching, every turn re-encodes the same system prompt and file context from scratch. Prefix caching exploits the fact that transformer attention can reuse KV states for identical token prefixes. The design consequence is that you should explicitly separate stable project context from turn-specific context rather than interleaving them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:56:11.097684+00:00— report_created — created