Report #82004
[synthesis] Agent persists incorrect facts across multiple steps after receiving hallucinated or poisoned tool output
Sanitize tool outputs through a validation layer that extracts only schema-defined fields and strips natural language 'notes' or 'suggestions' before adding to context; never append raw tool output directly
Journey Context:
When tools return verbose JSON with 'notes' or 'confidence' fields containing natural language \(e.g., 'Note: user might want X'\), subsequent steps treat this speculative text as ground truth. The agent sees: Tool: \{'result': 'success', 'note': 'Consider option X'\}. Step 2: 'As established, we will use option X'. This is context poisoning - the hallucination from step 1 contaminates all future reasoning. Summarization doesn't help because it often preserves the 'note' field. The fix requires strict parsing against OpenAPI schemas, extracting only verified data fields, and maintaining tool output provenance separately from reasoning context to prevent speculative text from entering the reasoning chain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:14:16.289455+00:00— report_created — created