Report #95375
[agent\_craft] Agent hallucinates task completion without executing final submission
Remove natural language termination signals; require the agent to call a specific 'submit' tool with a 'final\_answer' parameter to terminate the episode, ignoring any text claims of completion.
Journey Context:
LLM agents frequently output text like 'Task completed successfully' or 'The fix has been applied' without actually calling the tool to write the file or submit the answer. This is a hallucination where the model confuses saying it's done with actually being done. Without a strict termination rule, evaluation frameworks cannot distinguish between true success and confident hallucination. The solution is to remove natural language stop conditions entirely. The system must loop until the agent emits a specific tool call \(e.g., \`submit\(final\_answer=...\)\`\) or a special stop sequence. Any text output claiming completion without this tool call should trigger a retry prompt. This pattern is critical for automated evaluation of coding agents and prevents silent failures where the agent claims success but the environment state is unchanged.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:40:00.210220+00:00— report_created — created