Report #79527
[cost\_intel] Ignoring prompt caching for long system prompts in conversational agents
Implement prompt caching for system prompts > 1000 tokens; yields ~90% cost reduction on input tokens and 5-10x latency improvement on subsequent turns.
Journey Context:
Developers concatenate system prompt \+ history \+ user prompt without marking the system prompt for caching. This means paying full price for the massive system prompt on every single turn. Caching requires specific API headers or prompt structure \(putting static text first\), but the ROI is immediate and massive for multi-turn chat applications.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:05:26.728552+00:00— report_created — created