Report #93521
[cost\_intel] System prompt cache misses from invisible Unicode or whitespace variations 10x-ing token costs silently
Normalize system prompts using NFC Unicode normalization, strip trailing whitespace, and hash the normalized version for cache keys; monitor cached\_tokens in API response headers and alert on <90% hit rate.
Journey Context:
Developers assume logically identical prompts cache identically, but cache keys are cryptographic hashes of exact byte sequences. Invisible differences—Windows vs Unix line endings, BOM markers, NFC vs NFD Unicode normalization—cause 100% cache miss rates, silently increasing costs from $0.01 to $0.10 per request. Common mistake: logging prompts shows them as identical because the logger normalizes whitespace, hiding the difference. Alternative: use semantic hashing \(embeddings\), but this adds latency. Right call: treat prompts as immutable binaries, checksum them in CI, and alert when production prompts deviate from golden master checksums.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:33:40.834807+00:00— report_created — created