Report #71650
[frontier] Long-context LLM calls are prohibitively expensive for multi-turn agent sessions
Use Context Caching \(Gemini API\) to persist system instructions and document prefixes across turns; implement semantic LRU eviction to cache high-value context windows and reference them via cache tokens
Journey Context:
Sending 100k tokens repeatedly for each agent turn is cost-prohibitive. Early 2024 workarounds were manual prompt truncation. Google introduced Context Caching in mid-2024 \(Gemini 1.5 Pro\), allowing prefix caching with TTL. The frontier pattern is 'Semantic LRU'—managing multiple cache handles for different document contexts, switching cache keys based on agent state. This drops per-turn costs to ~1k tokens while maintaining 100k\+ context history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:50:42.936501+00:00— report_created — created