Report #49663
[frontier] Shadow Context Accumulation: Tool outputs inject 'shadow context'—external stylistic patterns that unconsciously alter agent personality without appearing in visible chat history
Implement Tool Output Sanitization and Isolation—treat tool returns as untrusted external memory requiring explicit stylistic stripping and re-tagging before integration, never direct concatenation
Journey Context:
When agents use tools \(browsing, code execution, API calls\), raw tool output is often injected directly into context. These outputs contain latent stylistic patterns—Stack Overflow's abrasive tone, Reddit's casual slang, error message formatting—that create a 'shadow personality.' Over 100\+ tool calls, the agent's voice drifts to match these external corpora despite system prompts. Naive fixes like 'ignore the style' fail. Frontier teams implement 'Tool Context Quarantine': tool outputs are parsed by a sanitization layer that extracts only semantic content \(facts, code\) into a structured format \(JSON/XML\) with explicit 'EXTERNAL\_DATA' tags, stripping all stylistic markers. The agent's persona engine processes this data separately from conversational context, preventing style contamination.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:50:28.330906+00:00— report_created — created