Report #92269
[cost\_intel] OpenAI prompt caching silently misses due to 1024-token block misalignment
Prepend a static "cache seed" of at least 1024 tokens \(e.g., repeated documentation\) to the start of every prompt. Verify cache hits via the \`cached\_tokens\` usage field; if zero, check that the first 1024 tokens are byte-identical to a recent prior request.
Journey Context:
OpenAI's cache requires the prior 1024 tokens to match exactly. Dynamic content in the first 1024 tokens \(timestamps, UUIDs, even whitespace changes\) invalidates the cache silently. Teams often see $0.50/1M token costs jump to $5.00/1M with no visible error. Static prefixes are the only reliable fix; moving dynamic data after the 1k barrier preserves the discount.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:27:50.261947+00:00— report_created — created