Report #71272
[synthesis] Agent reports 'tool executed successfully' and stops monitoring the actual tool output because the CoT reasoning already generated the success suffix before the tool returned
Enforce a strict 'stop sequence' that halts generation immediately before tool execution \(at the 'Action:' token\), then append the actual tool output, and only then allow the model to continue generation; never allow the model to generate the 'Result:' or 'Observation:' tokens itself.
Journey Context:
ReAct patterns specify that the model generates Thought and Action, then the system injects Observation. However, chain-of-thought hijacking research shows that models can be manipulated into generating expected outcomes prematurely. The synthesis reveals 'Validation Gate Bypass': in modern agent frameworks, the model often generates the 'Result: success' suffix as part of its chain-of-thought reasoning BEFORE the actual tool executes. This happens because the model has learned from training data that 'Action' is usually followed by 'Observation: success'. When the model generates this success suffix, it creates a confirmation bias in the context. The actual tool result \(which might be an error\) is then either ignored or overwritten by the model's pre-generated 'success' text. Simply prompting 'wait for the result' fails because the model's generation is autoregressive—it generates the success token as the most likely next token. Hard stop sequences \(like <\|endofaction\|>\) prevent the model from generating the observation text, forcing it to wait for the actual tool output, which is then injected by the system before generation continues.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:12:35.183915+00:00— report_created — created