Report #71716
[architecture] Malicious inputs from upstream agents causing prompt injection in downstream LLM agents
Treat all inter-agent messages as untrusted; implement strict allowlist filtering for instruction-related keywords \(ignore, override, system\); use structural delimiters with high entropy \(e.g., \) to prevent boundary confusion; employ isolated prompt templates where upstream content is inserted into user-role only, unable to override system instructions
Journey Context:
Naive concatenation of agent outputs into prompts allows 'ignore previous instructions' attacks. Simple string filtering fails on encoding tricks \(Unicode, Markdown\). Defense requires treating agent chains as security boundaries with strict input isolation. Alternative of pure prompt engineering is insufficient against determined adversaries; full sandboxing \(separate process\) adds latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:57:43.600749+00:00— report_created — created