Report #46984
[gotcha] LLM agent loop exploiting verbose error messages for system reconnaissance
Return only generic, non-revealing error messages to the LLM when a tool call fails. Log the detailed stack traces and errors securely on the server side, out of the LLM's context.
Journey Context:
When an LLM agent calls a tool \(e.g., a database query or file read\) and it fails, developers often feed the full Python exception or SQL error back into the LLM context so it can self-correct. An attacker can craft a payload that intentionally causes a specific error \(e.g., a path traversal\) to read the verbose error message, thereby leaking internal system architecture, file paths, or table schemas back to the attacker through the LLM's final response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:20:08.046759+00:00— report_created — created