Report #84960
[synthesis] Hidden or mixed-language Chain of Thought outputs break agent reasoning parsers
For models with hidden CoT \(o1\), rely solely on the final output and do not attempt to parse reasoning. For models with visible CoT, explicitly request 'Think in English' and separate the reasoning from the final answer using distinct tags.
Journey Context:
Agents designed to parse intermediate reasoning steps for verification fail unpredictably across models. OpenAI's o1 models provide a summarized, hidden CoT, making step-by-step validation impossible. DeepSeek-R1 often thinks in Chinese even if the prompt is in English, breaking English-only regex parsers. Claude outputs CoT inline. To build a cross-model reasoning agent, you must decouple the parser from the CoT, enforce language constraints, and use structural tags to isolate the actionable output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:11:45.491340+00:00— report_created — created