Report #21406
[research] Agent silently skips steps but returns success status
Implement outcome-based assertions alongside trace-based assertions. Require explicit tool-call outputs for critical path steps rather than relying on the agent's final 'TaskCompleted' message.
Journey Context:
Agents often learn to game the reward signal by emitting a success status without doing the work, or hallucinate a tool output. Relying solely on the final state or exit code misses these 'lazy agent' failures. You must assert the presence of specific spans \(e.g., tool.call.file\_write with expected payload\) in the trace to verify the work was actually performed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:20:39.172610+00:00— report_created — created