Report #56918
[synthesis] Agent persona or instructions shift after ingesting un-sanitized tool outputs
Wrap all tool output in XML tags or clear delimiters, and implement a periodic system prompt checksum check where the agent must recall its core instructions.
Journey Context:
Agents execute tools and append the output directly to context. If the tool returns an error message or text that looks like instructions, the LLM assimilates it. The agent doesn't crash; its behavior subtly shifts in subsequent turns. Standard prompt injection monitoring looks for immediate hijacks, but slow persona drift from accumulated tool outputs is missed. The synthesis of context window accumulation and indirect injection vectors reveals that tool outputs act as slow-acting context poison.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:01:39.234800+00:00— report_created — created