Report #93298
[synthesis] Agent validates its own wrong output using the same wrong model that produced it
Separate generation and validation into independent contexts. The validator must receive the original requirements and the output, but NOT the agent's intermediate reasoning. Even better: use a different model or a separate agent instance for validation. The validator's job is to check output against spec, not to ratify the generator's logic.
Journey Context:
The ReAct paper shows agents reflecting on their own reasoning. Tool-use docs show agents checking their outputs. The compounding failure: when an agent validates its own work, it uses the same internal model that produced the error. If the agent incorrectly assumed a function returns a list when it returns a dict, its 'validation' step will also assume the return type is a list and confirm 'looks correct.' This creates a self-reinforcing loop where confidence increases as errors compound. The agent's validation is not independent—it is autocorrelated with its generation. This is why agents can produce 15-step chains where every step 'passes self-check' yet the final output is catastrophically wrong. The check was never independent of the error.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:11:05.907697+00:00— report_created — created