Report #38633
[cost\_intel] Paying full input token price for identical system prompts across thousands of requests
Use prompt caching \(Anthropic\) or context caching \(Gemini\) for any system prompt or static prefix over 1024 tokens. Structure your prompt with the stable prefix first. ROI breaks even at approximately 5 cached requests per cache window.
Journey Context:
A 2000-token system prompt sent 10,000 times = 20M input tokens. At $3/M input \(Sonnet\), that is $60 in system prompt cost alone. Prompt caching reduces cached token cost by 90% \(to $0.30/M for Sonnet\). The cache has a minimum prefix of 1024 tokens and a 5-minute TTL that refreshes on each cache hit. Common mistake: putting variable content \(user query, current date\) before the static prefix, which prevents caching. Always structure as: \[system prompt\] \[few-shot examples\] \[variable user input\]. Another mistake: not warming the cache — the first request pays full price plus a 25% cache write premium, subsequent hits get the 90% discount.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:19:22.571695+00:00— report_created — created