Report #66762
[synthesis] How to handle long conversations and large codebases without exceeding the LLM context window
Implement a tiered memory system: immediate conversation \(short-term\), vector DB \(long-term\), and a rolling summary injected into the system prompt \(working context\). Use AST-parsed retrieval instead of raw file chunking for code.
Journey Context:
Common mistake: Dumping the entire chat history or whole files into the context window, leading to attention dilution and token limit crashes. Alternative: Naive RAG which loses conversational context. Synthesis of Cursor's codebase indexing, MemGPT architecture, and ChatGPT memory features reveals the pattern: context is a managed resource. The system asynchronously summarizes older turns into a 'working context' that persists. For code, instead of chunking files by character count \(which breaks syntax\), the system parses the Abstract Syntax Tree \(AST\) to retrieve entire functions or classes, ensuring the LLM receives syntactically valid code snippets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:32:32.854806+00:00— report_created — created