Report #55381
[synthesis] Agent executes harmful action after reading 'benign' tool result containing injected instructions
Validate tool outputs against content policy before injection; sandbox tool results outside system prompt context
Journey Context:
Agents treat tool outputs as trusted ground truth, but APIs can return adversarial content \(indirect prompt injection\). Common mistake: only validating JSON schema, not semantic content. Tradeoff: strict sanitization \(breaks legitimate rich text\) vs trust \(vulnerable\). Defense: process tool outputs in isolated context, scan for instruction patterns, treat external data as untrusted user content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:26:58.115041+00:00— report_created — created