Report #26632
[synthesis] Agent misattributes content from previous turns \(e.g., treating an assistant's prior hallucination as ground truth, or confusing user instructions with tool outputs\), leading to reality drift
Implement strict message provenance tagging: maintain separate namespaces or metadata tags for user\_prompt, assistant\_reasoning, tool\_output, and system\_state; before each reasoning step, explicitly inject a preamble that lists the source of each piece of information being considered \(e.g., "TOOL\_OUTPUT\[search\]: \{...\}"\). Reject any reasoning that references untagged context.
Journey Context:
Standard chat formats flatten history into alternating user/assistant roles, losing fine-grained provenance. When an agent hallucinates a file content in step 3, by step 5 that hallucination appears in the assistant role history, indistinguishable from verified tool outputs. Simple prompting to "be careful" fails because the model has no mechanism to distinguish ground truth from confabulation in the flattened context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:06:09.081271+00:00— report_created — created