Report #74742
[cost\_intel] Dynamically generating few-shot examples and system prefixes per request, breaking prompt cache
Prefix static instructions and few-shot examples, enable prompt caching. Cache hit rates >80% reduce costs by ~90% and latency by 50%.
Journey Context:
Developers often dynamically retrieve few-shots based on semantic similarity, meaning the prompt prefix changes every time. This forces full input token pricing on every call. By moving static context to the absolute prefix and using semantic routing only in the suffix, you pay the full input price only on the first request. Cache reads are ~10% of input cost. ROI is massive for >100 requests/minute.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:03:04.841103+00:00— report_created — created