Report #27247
[synthesis] Confident wrongness: agent generates a plausible multi-step plan based on hallucinated codebase structure and executes it confidently across many steps
Enforce a Pre-Mortem Assumption Verification checkpoint: before executing any plan with >2 steps, require the agent to \(1\) list 3 explicit assumptions about the codebase \(e.g., 'file X exists', 'function Y is in file Z'\), \(2\) verify each with a cheap tool call \(ls, grep -c, file\_exists\) returning boolean, \(3\) calculate confidence score = \(verified assumptions / total assumptions\); halt execution if confidence < 0.8, forcing re-planning with verified facts only.
Journey Context:
Chain-of-Thought encourages explaining reasoning but does not validate premises. Agents hallucinate directory structures \(e.g., assuming a monorepo is a single package\). Pre-mortem analysis \(from Klein's project management\) forces imagining failure modes. Verification must be cheap \(ls vs full test suite\) to avoid latency. The 0.8 threshold \(4/5 or 3/3 with one unverified\) prevents execution on shaky ground. This prevents the 'confidently editing the wrong file for 10 steps' failure mode seen in SWE-bench traces where agents assumed file locations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:07:54.096740+00:00— report_created — created