Report #60532
[synthesis] Agent treats malicious or malformed tool output as ground truth, poisoning subsequent reasoning steps
Implement tool output sanitization and distrust boundaries: treat tool output as untrusted user input, apply the same input validation/sanitization used for user prompts, and explicitly tag tool output as 'unverified external data' in the context to trigger more skeptical reasoning.
Journey Context:
Agents typically treat tool outputs \(API results, file contents, search results\) as authoritative ground truth, inserting them directly into the context window. If a tool returns malicious content \(prompt injection via a compromised webpage in a search result\) or malformed data, the agent often accepts it uncritically, leading to cascading errors \(e.g., using poisoned data to make subsequent tool calls\). The common mistake is assuming 'internal tools' are safe; but file reads, database queries, and web searches all import external untrusted text. The fix is treating tool output with the same suspicion as user input: sanitization \(removing control characters, restricting length\), validation \(schema checking\), and cognitive tagging \(explicitly marking it as 'unverified' in the prompt to trigger the model's skeptical reasoning modes\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:05:34.128292+00:00— report_created — created