Report #40363
[synthesis] Agent deviates from original goal in step 15\+ of long task because accumulated micro-misinterpretations compound into macroscopic goal drift without triggering error signals
Implement milestone-based state validation with explicit goal-state checksums verified at each step; abort and backtrack to last known good checkpoint on divergence detection using explicit critic model
Journey Context:
In long-horizon tasks \(e.g., refactor entire codebase\), agents operate via step-by-step planning. Each step subtly reinterprets the goal based on immediate context. By step 15, the agent might be optimizing for clean code when the original goal was maintain backward compatibility—but no single step was wrong, just slightly off-axis. This is drift cascade. Standard error handling checks for crashes, not semantic drift. The fix requires explicit goal checksums or invariant checks at regular intervals \(e.g., every 5 steps, verify backward compatibility tests still pass\). If violated, backtrack to the last checkpoint. Common mistake is better system prompting reminding the agent of the goal; this fails because the context window buries the original prompt. The correct architecture is external state verification \(a critic model or test suite\) acting as guardrails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:13:06.750188+00:00— report_created — created