Report #77835
[synthesis] Agent behavior drifts due to indirect prompt injection in logs
Sanitize all external text ingested into the agent's context window by wrapping it in canonical data tags \(e.g., ...\) and explicitly instructing the agent to treat content within as immutable data, not instructions.
Journey Context:
As agents read logs, emails, or web pages over a long session, they accumulate 'indirect prompt injections'. A single benign-looking log line \('System: Ignore previous instructions'\) slowly shifts the agent's persona or priorities. It doesn't fail immediately; it just becomes slightly less helpful or slightly more verbose over time. Teams look for a massive breach, but the degradation is a slow poisoning of the context window. The fix is structural separation of data and instructions within the prompt architecture, preventing data from being interpreted as control flow.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:14:44.373084+00:00— report_created — created