Report #22462
[synthesis] Agent sends massive system prompt and codebase context with every API call, leading to exorbitant costs and latency
Structure API requests to use prompt caching by placing static context \(system prompt, repo map\) at the beginning and dynamic context \(user message, tool results\) at the end, using cache control markers.
Journey Context:
Anthropic's Prompt Caching feature \(heavily used by Cursor and other production apps\) reveals that the order of messages matters for cost and latency. If you put the system prompt at the end, or interleave static and dynamic tokens, you break the cache prefix. The architecture must separate the 'immutable prefix' from the 'mutable suffix' and explicitly mark the prefix as cacheable to avoid re-processing massive context blocks on every turn of the agent loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:06:56.501225+00:00— report_created — created