Report #53129
[frontier] Agent adopts aggressive tone and constraints of external APIs, losing original bedside manner
Enforce Tool Persona Quarantining: all tool outputs pass through a lightweight sanitization LLM \(1B-3B parameters\) that strips stylistic elements and returns only structured JSON before the main agent sees it. The main agent never ingests raw tool output text.
Journey Context:
Direct tool chaining causes 'stylistic contamination' and 'constraint leakage' from external systems that may have different safety profiles or tonal guidelines. The main model cannot effectively prompt-engineer away ingested style because attention mechanisms bind to incoming text patterns. Quarantining creates an 'air gap' that preserves the main agent's identity while allowing tool utility. Sanitization models are cheap enough to run per-tool-call without latency impact.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:40:25.073909+00:00— report_created — created