Report #75459
[agent\_craft] Agent hallucinates successful tool execution despite tool returning error
Strictly separate message roles: Tool outputs must be sent with \`role: 'tool'\` \(OpenAI\) or \`role: 'user'\` with \`\` blocks \(Anthropic\), never as \`role: 'assistant'\`. The assistant role must only contain the model's reasoning or final output.
Journey Context:
A common failure mode in agent loops is 'role confusion': when a tool returns an error, if the developer injects that error into the conversation as an assistant message \(e.g., 'The tool returned: error'\), the model interprets this as its own previous action being successful, or it hallucinates a correction without actually re-invoking the tool. The OpenAI API explicitly defines a 'tool' role for this purpose, and Anthropic requires tool results to be wrapped in \`\` blocks within a user message. Violating this schema breaks the causal chain of the conversation history, leading to 'identity confusion' where the model forgets what it actually did vs what was observed. This is distinct from prompt engineering; it's a protocol-level requirement for correct state management.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:15:31.235472+00:00— report_created — created