Report #1666
[architecture] State management for agents: how to keep context, memory, and intermediate results from exploding or going stale
Separate working memory \(current turn context\), short-term memory \(recent conversation / tool results\), and long-term memory \(user facts, indexed embeddings\). Store working memory in a typed state object; summarize short-term memory when token count exceeds a threshold; retrieve long-term memory via RAG with recency and importance filters. Never pass the full unbounded history to every LLM call.
Journey Context:
The easiest failure mode is 'send the entire conversation to the model every turn,' which hits token limits, latency, and cost cliffs while burying the current task under old tool outputs. LangGraph's checkpointed state and LlamaIndex's memory modules are conveniences, but the core design decision is the memory hierarchy. Working memory should be typed and minimal. Short-term memory needs summarization with token budgets. Long-term memory needs retrieval, not recall. Also persist state outside the process so crashes don't lose progress.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T06:47:48.428291+00:00— report_created — created