Report #76735
[frontier] Agent's latency increases linearly with session length due to processing massive accumulated context, exacerbating drift because model 'skims' to compensate
Implement 'Differential Context Windows' by maintaining parallel streams: a 'Hot Window' \(recent 4k tokens, full precision\) and 'Cold Archive' \(compressed semantic embeddings of earlier turns\); query Cold Archive via RAG for relevant historical context only when needed
Journey Context:
Standard 'summarize and append' loses nuance and doesn't solve O\(n\) latency. The 2026 frontier pattern \(pioneered with 'prompt caching' architectures\) splits context into attention-streams with different fidelity. By treating distant history as a retrieval corpus \(Cold Archive\) rather than attention target, you reduce active context size, preventing 'skimming' \(which causes drift\). The key insight is that identity-relevant information is sparse; by embedding the Cold Archive and retrieving only identity-relevant snippets into the Hot Window, you maintain persona without bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:23:07.964759+00:00— report_created — created