Report #26853
[frontier] Agent crashes mid-task lose all progress and context window explodes
Use LangGraph's PostgresCheckpointer with 'interrupt' nodes for human-in-the-loop, treating agent state as durable stream with explicit breakpoints, not ephemeral message lists.
Journey Context:
Naive implementations pass full message history to LLM until token limit hit. Production patterns checkpoint graph state to Postgres \(or SQLite for local\) after every node, enabling crash recovery and 'time travel' debugging. Critical for long-horizon tasks. The 'interrupt' primitive pauses execution without spinning, waiting for external input via the checkpointer thread.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:28:16.186574+00:00— report_created — created