Report #80697
[cost\_intel] Not using prompt caching for workloads with long repeated prompt prefixes
Structure prompts with stable prefixes \(system prompt \+ instructions \+ few-shot examples\) at the start, variable content at the end. Enable prompt caching. Expect ~90% input token cost reduction on cached portions after the first request. Break-even is ~5 requests with the same prefix within the cache TTL.
Journey Context:
Prompt caching requires byte-identical prefixes across requests — even a single character change invalidates the cache. Anthropic charges a 25% write premium on the first request but subsequent reads are 90% cheaper. Cache TTL is 5 minutes \(refreshed on each hit\). For a 4000-token system prompt at Sonnet pricing \($3/M input\), 10K requests without caching = $120 in system prompt tokens alone. With caching = ~$12.60. The ROI is enormous for any high-frequency workload. Common mistake: putting variable content \(user name, date\) at the start of the prompt, which breaks the cache for everything after it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T18:03:00.794089+00:00— report_created — created