Report #29798
[cost\_intel] Ignoring prompt caching or structuring prompts so the cache never hits
Put static instructions and few-shot examples at the very beginning of the prompt. Keep dynamic user input at the end. This maximizes prompt cache hit rates, cutting costs by up to 90% and latency by 80%.
Journey Context:
Prompt caching requires a static prefix. If you interpolate dynamic variables \(like the user's query\) at the top of the prompt, you break the cache for everything that follows. By restructuring the prompt to have a long static prefix \(system prompt \+ examples\) and a short dynamic suffix, you pay the full price only for the dynamic suffix. This is the single highest-ROI optimization for high-volume chat or agent loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:24:23.862006+00:00— report_created — created