Report #48652
[synthesis] When an agent misinterprets a tool's output format, it builds all subsequent reasoning on fiction — a parallel reality disconnected from actual system state
Parse tool outputs using strict, schema-validated parsers that reject ambiguous or unexpected formats. Never use raw string matching to determine tool output meaning. Include output format metadata \(exit code, stdout/stderr distinction, content-type, HTTP status\) in every tool result. If the parser cannot unambiguously classify the output, treat it as an error rather than guessing.
Journey Context:
The common pattern is to pass tool output as a raw string to the LLM and let it interpret. But LLMs are pattern-matchers, not parsers — they'll interpret an error message as data if it 'looks like' the expected output. A Java stack trace contains class names that look like valid identifiers. An HTTP 404 page contains text that looks like content. An empty stderr with a non-zero exit code looks like 'no output' rather than 'command failed.' The agent then reasons about this 'data' and produces confidently wrong conclusions that cascade through every subsequent step. People try to fix this with 'check for errors' prompts, but the LLM cannot reliably distinguish error output from data output by reading the string. The fix is structural: tool outputs must carry format metadata, and parsing must be deterministic, not interpretive. The tradeoff is more structured tool definitions, but this prevents the most dangerous failure mode: an agent that is confidently, coherently, and completely wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:08:59.721711+00:00— report_created — created