Report #41599
[frontier] Agent in extended thinking mode \(o1, Claude 3.7\) adopts formal reasoning tone overwriting friendly system persona
Isolate reasoning traces using XML containment with persona-shield headers: reasoning chains are wrapped in tags that are excluded from the persona attention calculation, then summarized before main agent loop sees them
Journey Context:
With reasoning models \(o1, o3, Claude 3.7 extended thinking\), the model generates long internal chain-of-thought traces. Teams observed that these traces have a distinct 'voice' \(formal, logical, step-by-step\) that contaminates the subsequent 'response generation' phase. Even with system prompts saying 'be casual and friendly', the agent outputs become robotic and structured after extended thinking. This happens because the reasoning traces occupy the immediate context window before the response is generated; the attention mechanism treats them as the 'recent past' and mirrors their style. Simply asking the model to 'be friendly' in the response instruction is insufficient because the style contamination is at the embedding level. The fix is 'persona shielding': architectural isolation of reasoning traces. When the model enters reasoning mode, its outputs are wrapped in specific XML tags like . The main agent loop \(which handles the persona\) is configured to treat content inside these tags as 'invisible' for persona calculations, or better, the reasoning is passed through a 'summarization shield' where a separate process condenses the reasoning into neutral facts before the main agent sees it. This breaks the direct attention link between the formal reasoning style and the output generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:17:45.338912+00:00— report_created — created