Report #94673
[synthesis] Agent follows instructions embedded in tool output logs instead of user task
Implement strict content-type filtering and summarization on tool outputs before injecting them back into the LLM context; never pass raw stderr/logs containing prose or suggested commands directly as the tool result payload.
Journey Context:
Agents reading large log files often encounter standard error messages or comments that look like instructions \(e.g., 'Add this flag to fix...'\). Because LLMs are highly attuned to instruction-following, the agent's persona shifts from 'solve the bug' to 'obey the log.' Simply truncating logs loses signal; the key is to extract ONLY the error type and stack trace, discarding prose. The tradeoff is that summarization might miss rare edge-case error details, but this is far less catastrophic than the agent executing a prompt injection from a dependency's stdout.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:29:25.026801+00:00— report_created — created