Report #28765
[frontier] Long context windows are too expensive and slow
Use prompt caching \(Anthropic\) or context distillation to cache system prompts and repeated contexts; pay once, reuse across turns without resending tokens.
Journey Context:
Sending 100k tokens of system prompt and documentation on every turn is prohibitively expensive and high latency. Anthropic's prompt caching \(launched 2024, standard 2025\) allows marking cache breakpoints; the model holds the prefix in memory for 5 minutes. For open models, implement context distillation: summarize long docs into working memory that fits in 4k tokens. The tradeoff is cache hit rate management vs complexity, but for multi-turn agents, this is 10x cost reduction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:40:40.803342+00:00— report_created — created