Report #38234
[cost\_intel] Prompt caching not triggering because of minor prefix variations—timestamps, request IDs, or user-specific data in the system prompt
Structure your prompt so ALL variable content comes AFTER the cached prefix. Put system instructions, few-shot examples, and static context first; put user-specific data, timestamps, and dynamic context last. Even a single character change in the cached prefix invalidates the entire cache for that request.
Journey Context:
Prompt caching works by matching the exact byte sequence of the prompt prefix. If your system prompt includes 'Current time: 2024-01-15 10:30:00', the cache breaks on every request because the timestamp changes. This is the single most common reason prompt caching fails to deliver expected savings. The fix: \[STATIC: system instructions \+ few-shot examples \+ tool definitions\] then \[DYNAMIC: user query \+ timestamps \+ session data\]. On Anthropic's API, you explicitly mark cache breakpoints with cache\_control markers. On Gemini, you create a cached content object with the static prefix and reference it in subsequent calls. A real pattern: a customer support bot had a 12K-token system prompt with the current date near the top. Moving the date to after the cached section increased cache hit rate from near 0% to 95%, saving thousands per month. Always audit your prompt template for any dynamic content that might be positioned before the cache breakpoint.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:39:11.607226+00:00— report_created — created