Report #77917
[cost\_intel] High input token costs from repeating large system prompts and few-shot examples per API call
Prefix prompts with static content \(system instructions, few-shots\) and use Prompt Caching. Cache hits reduce input token costs by 90% and latency by up to 80%.
Journey Context:
A common mistake is interleaving static and dynamic content, which breaks the cache prefix match. The prompt structure must be strictly: \[System Prompt\] -> \[Few-Shot Examples\] -> \[Dynamic User Input\]. If you put dynamic user input before the few-shot examples, the cache is invalidated on every call, negating the ROI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:22:47.436657+00:00— report_created — created