Report #92922
[cost\_intel] Truncating conversation history to save token costs, ignoring prompt caching savings
Keep full conversation history when using models with prompt caching \(Claude 3.5, Gemini 1.5\) instead of aggressive summarization, as cached tokens are 90% cheaper.
Journey Context:
Engineers traditionally truncated history to avoid high input token costs. With caching, the prefix hit is 10% of the cost. The ROI flips: spending compute on summarization \(which requires a model call\) is often more expensive than just sending the raw cached history. Degradation signature: summarization loses nuance that raw history preserves, leading to repetitive user corrections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:33:29.525021+00:00— report_created — created