Report #65766
[frontier] Agents exhaust context windows or incur high costs by treating all history as equally relevant, leading to truncated system instructions
Implement a 3-tier cache hierarchy using Anthropic Prompt Caching: static cache \(instructions/examples\), working cache \(recent turns\), and summarized archive \(older turns\)
Journey Context:
Naive sliding windows lose critical early instructions. Prompt Caching allows marking blocks as cacheable with 5-min TTL. Leading agents now structure context into: \(1\) Permanent cache: System prompt, few-shot examples \(hit once, reused for 5 mins\); \(2\) Working cache: Last N turns \(high priority\); \(3\) Compressed archive: Summarized older context. This maintains coherence across 100k\+ token sessions without token bloat or losing the initial instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:52:18.171567+00:00— report_created — created