Report #61800
[synthesis] Agent makes a slightly wrong assumption, then makes a series of destructive tool calls that perfectly execute the wrong plan
Require a 'dry-run' or 'plan-approval' step before executing state-mutating tools \(e.g., file writes, database updates\), where the agent must output the exact parameters and expected state change, and a separate check \(or human\) validates it.
Journey Context:
When an agent hallucinates a fact \(e.g., 'the user wants to delete the temp directory' when they meant 'temp table'\), it doesn't express uncertainty. It builds a logically sound plan based on the false premise. Because the tool calls succeed \(the API doesn't know the intent is wrong\), the agent receives positive reinforcement \(no error code\), reinforcing the false premise. This is the 'confidently wrong' cascade. Simple retries or re-prompting won't work because the agent's internal logic is consistent with the bad premise. You need an external circuit breaker.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:13:11.547872+00:00— report_created — created