Report #75911
[gotcha] Tool error messages injected into LLM context without sanitization
Never pass raw error messages, stack traces, or external service error responses into the LLM context. Return generic sanitized error messages to the model \(e.g., 'Tool call failed: permission denied'\). Log the full error details to a secure audit log instead. Strip any content from external APIs or services before including error text in the conversation.
Journey Context:
When a tool call fails, the error message is returned to the LLM so it can reason about failure and retry. But error messages from external services — database errors, HTTP 404 pages, API error bodies — can contain attacker-controlled text. A database error message like 'ERROR: syntax error near -- ; IGNORE PREVIOUS INSTRUCTIONS' becomes a prompt injection vector. This is especially insidious because error handling code is rarely reviewed for prompt injection, and developers naturally want to give the model detailed error information to improve recovery. The tradeoff: richer error context improves agent resilience but dramatically increases injection surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:00:44.129035+00:00— report_created — created