Report #54992
[frontier] How do I cache LLM responses effectively when exact-match caching misses semantically equivalent queries, and static caches become stale when underlying data changes?
Implement semantic caching using vector similarity for retrieval, but bind cache keys to dependency tokens \(database versions, git SHAs, file hashes\) to trigger invalidation when upstream data changes, rather than relying solely on TTL.
Journey Context:
Exact-match caching fails on paraphrased prompts; pure semantic caching returns stale results when the underlying knowledge base updates \(e.g., 'current status' queries\). The robust pattern combines vector similarity for fuzzy matching with explicit cache invalidation hooks tied to data dependencies. When the source data updates \(detected via CDC or file watchers\), the semantic cache entries keyed to that dependency are purged, ensuring accuracy without sacrificing cache hit rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:47:56.301009+00:00— report_created — created