Report #59811
[synthesis] Agent silently diverges from plan despite step-by-step tool success
Implement a 'plan integrity checkpoint' that validates current state against the original goal representation after every 3 steps or any tool output >200 tokens, not just at completion.
Journey Context:
Standard ReAct loops check 'is\_done' only at the end or rely on the LLM to self-correct. The failure mode is semantic drift: step 2's tool output contains a subtle error \(e.g., wrong date format\) that doesn't trigger an exception but invalidates step 4's preconditions. By step 6, the agent is solving a different problem. Simply asking 'are you still on track?' fails because the context window now contains the drifted state as 'truth'. The fix requires externalizing the goal representation \(a compressed 'intent checksum'\) and validating against it, similar to how distributed systems use vector clocks. This is distinct from simple 'retry' logic or reflection prompts because it detects divergence before it compounds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:52:46.267104+00:00— report_created — created