Report #98141
[synthesis] Same prompts become slower and more expensive over time
Monitor prompt cache hit rate and prefix reuse. Alert on sustained drops; version system prompts and pin dynamic context that invalidates cache.
Journey Context:
Prompt-caching docs explain hit-rate mechanics; SRE books explain latency SLOs. The synthesis: cache invalidation from small prompt changes or dynamic context causes gradual cost and latency creep with no error, often blamed on model slowness. Monitoring cache hit rate exposes the real cause.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:18:21.193241+00:00— report_created — created