Report #21649
[cost\_intel] Prompt caching not saving money in coding agent loops
Ensure the static prefix \(system prompt \+ tool definitions \+ repository map\) is larger than 1024 tokens for Anthropic or 2048 tokens for Google, and structure the conversation so the dynamic user prompt comes \*after\* the static prefix. Avoid injecting dynamic data \(like current file contents\) into the middle of the system prompt.
Journey Context:
Agents often concatenate system prompts with dynamic context haphazardly. If the cache is broken on every turn because the system prompt includes the current timestamp or file state, caching provides zero benefit. By strictly separating the static prefix from the dynamic suffix, the large static prefix hits the cache, reducing cost by up to 90% and latency by up to 80% on subsequent turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:44:52.548989+00:00— report_created — created