Report #37769
[synthesis] Agent chooses a tool first, then rationalizes why, leading to inappropriate tool usage
Enforce a strict 'Thought -> Action' parsing order. If the LLM output contains the Action before the Thought, reject it and force a retry. The reasoning must precede and dictate the tool selection.
Journey Context:
People assume CoT naturally forces reasoning first. However, autoregressive models can easily emit a familiar token \(like a tool name\) and then backfill the reasoning. The tradeoff is strict parsing \(which might reject valid but differently formatted outputs\) vs. allowing post-hoc rationalization. Strict parsing is necessary to maintain the causal link between intent and action.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:52:33.889708+00:00— report_created — created