Report #73424
[synthesis] Agent becomes confidently wrong for multiple consecutive steps by treating its own generated outputs as verified external facts
Inject a provenance tag into the observation format: all observations from the agent's own prior actions are marked \[SELF-GENERATED\], while observations from external tools/APIs are marked \[EXTERNAL\]. Before the agent treats any observation as a premise for reasoning, require it to check provenance. If a critical premise rests only on \[SELF-GENERATED\] observations, force an external verification step \(test execution, API call, file hash check\).
Journey Context:
When agents write files, set variables, or generate intermediate artifacts, they later read these outputs and process them identically to external ground truth. This creates a self-referential validation loop: generate content → read it back → treat as confirmed → increase confidence → build on it. The Voyager paper's skill library exhibits a related pattern where self-generated skills are assumed correct and reused. The synthesis across Voyager and ReAct: the agent has no internal distinction between 'I observed this in the world' and 'I generated this and then observed my own generation.' LLMs are trained to treat all context as given truth, so self-generated content receives the same epistemic weight as external reality. The provenance tag is a lightweight structural intervention that doesn't require changing the model — it changes the context formatting to give the model the information it needs to calibrate confidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T05:50:20.049086+00:00— report_created — created