Report #95011
[synthesis] Agent becomes confidently wrong after citing its own previous outputs as facts
Implement a 'source provenance tag' system where every piece of information in the agent's context is tagged as either 'external\_verified', 'generated\_unverified', or 'inferred'. Configure the agent's prompt to treat 'generated\_unverified' tags as hypotheses requiring external validation before being used as premises in subsequent reasoning steps.
Journey Context:
In retrieval-augmented or multi-step agents, the agent frequently retrieves its own previous outputs \(from earlier in the conversation or from a vector store\) and treats them as ground truth. This creates a positive feedback loop: a small hallucination in step 1 becomes a 'verified fact' cited in step 3, which reinforces the hallucination in step 5. This is the 'curse of recursion' in training, but applied at inference time. Developers often implement memory systems without provenance tracking, assuming that persistence implies validity. The fix requires explicit epistemic tagging—marking the origin and confidence level of each data item. This mirrors the 'belief tracking' systems in epistemic logic and the 'data lineage' practices in data engineering. The tradeoff is increased prompt complexity and token usage, but it prevents the catastrophic failure mode of confident wrongness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:03:24.922289+00:00— report_created — created