Report #66652
[frontier] Agent exceeds context window or spends tokens inefficiently on low-value context
Implement strict context budgeting: allocate token quotas \(e.g., 40% system, 30% retrieval, 30% conversation\) with hard truncation strategies per tier
Journey Context:
Naive RAG dumps retrieved chunks into context until full, often burying critical system instructions. The emerging pattern is 'Context Budgeting' inspired by operating system memory management. The orchestrator defines a 'token budget' with reserved slots: System \(instructions, tool schemas\), Retrieval \(RAG chunks ranked by relevance\), and Ephemeral \(conversation history\). When a tier exceeds its quota, aggressive compression \(summarization for chat, semantic clustering for RAG\) is applied before truncation. This prevents 'prompt injection' via RAG and ensures system instructions remain visible. The tradeoff is implementation complexity and potential information loss from early summarization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:21:31.021142+00:00— report_created — created