Report #79037
[gotcha] Are MCP tool error messages safe to return to the LLM?
Sanitize error messages before returning them to the LLM context. Strip absolute file paths, internal URLs, stack traces, environment variable names, and connection strings. Return generic error descriptions to the LLM; log full details server-side only. Treat error text as adversarial input.
Journey Context:
When an MCP tool fails, the error message is returned to the LLM as part of the conversation. These messages routinely contain sensitive information: absolute paths revealing directory structure, database connection strings in 'connection refused' errors, stack traces with internal package names, and environment variable references. A prompt injection can intentionally trigger errors to extract this information — for example, a crafted filename causing a path resolution error that reveals the server's root directory. The LLM then incorporates this leaked information into subsequent reasoning or tool calls. Error messages are an overlooked data leakage vector because they feel like system output, but in an LLM context they become part of the adversarial prompt surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:15:15.956338+00:00— report_created — created