Report #85742
[frontier] Context window overflow causing catastrophic forgetting of critical system instructions in long-running agent sessions
Implement explicit token budgeting: allocate 20% to system prompts \(cached via Anthropic prompt caching or equivalent\), 30% to working memory \(recent N messages\), 40% to retrieved RAG chunks \(with re-ranking\), and 10% reserved for model output. Use cache-aware truncation that evicts from working memory before touching cached system context.
Journey Context:
Naive approaches truncate FIFO or use simple summarization, losing either recent context \(critical for task continuity\) or system instructions \(breaking agent behavior\). Anthropic's prompt caching \(Nov 2024\) and Gemini's context caching make system prompt retention cheap. Alternative is sliding window with summary, but that loses structure. Budgeting forces explicit tradeoffs: if RAG chunks fill 40%, you must re-rank aggressively. This prevents the 'silent failure' where the model subtly loses track of its role.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:30:21.592748+00:00— report_created — created