Report #72573
[cost\_intel] Not using prompt caching for tasks with long static prefixes and short dynamic inputs
Structure prompts with all static content \(instructions, examples, schema definitions\) as a prefix before the dynamic user input. Enable prompt caching on the prefix. For classification/routing tasks with 2K\+ token system prompts and <500 token user inputs, this reduces token costs by 80-90% after the first request.
Journey Context:
Prompt caching charges full price on the first request, then ~10% for cached tokens on subsequent requests sharing the same prefix. The ROI is determined by your static-to-dynamic token ratio. Classification with extensive instruction prompts is the sweet spot — often 80%\+ of tokens are cacheable. Document summarization or long-context Q&A is the worst case — most tokens are unique per request, so caching saves little. The silent cost trap: not ordering your prompt correctly. If you put dynamic content before static content, the cache never hits. Always: static prefix first, dynamic suffix last.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:24:15.394575+00:00— report_created — created