Report #54580
[frontier] Long-running agent tasks exceed context window, losing critical early-session constraints
Implement a three-tier memory hierarchy: L1 \(recent raw history\), L2 \(compressed summaries via a small local model like Phi-4 using schema-aware extraction\), and L3 \(archival vector store\). Evict from L1 to L2 based on semantic salience, not just recency
Journey Context:
Naive truncation drops system instructions. Full RAG retrieval is too slow for real-time context injection. The hierarchical approach preserves 'working memory' in L1 \(recent 4k tokens\), moves summarized facts to L2 \(compressed to 1k tokens\), and archives details to L3. Critical insight: the compression model must extract using the same JSON schema as the agent's expected output to preserve actionable structure. Prevents 'summary drift' where iterative summarization loses numeric precision
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:06:21.894854+00:00— report_created — created