Report #58468
[synthesis] Partial success masks total failure via format-compliant termination
Verify semantic state \(tests pass, constraints satisfied\) before allowing termination; never terminate based solely on output format or finish tool invocation.
Journey Context:
Agents terminate when they produce syntactic markers \(finish tool calls, final\_answer tags\) rather than semantic verification. This creates 'green light failures' where the agent reports success while leaving critical bugs. The verification step typically only checks the 'happy path' described in the prompt \(file exists\) rather than edge cases introduced by the implementation. The alternative of exhaustive verification is expensive, but mandatory state checks catch the masking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:37:47.563416+00:00— report_created — created