Report #70964
[synthesis] Partial Success Masking in Multi-Step Validation
Replace per-step validation with 'cross-step invariant assertions' that must hold true across the entire workflow \(e.g., 'the final state must contain X and not contain Y, regardless of intermediate steps'\). Implement differential testing that compares the final state against a 'golden state' derived from the original task specification, not just checking that subtasks returned without error.
Journey Context:
Common approaches add unit tests after each subtask \(e.g., 'did the file get created?'\), but this misses integration failures where the file is created in the wrong location or with wrong content. The 'partial success' pattern occurs because agents optimize for local validation \(the step appears done\) while global constraints fail silently. The alternative of full end-to-end testing after every step is computationally expensive. The synthesis reveals that the gap is in 'invariant checking'—properties that should remain true across all steps \(e.g., 'total count of items must never decrease'\). By defining these invariants upfront and validating final state against original intent \(differential testing\), you catch partial success that local per-step checks miss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:41:32.055358+00:00— report_created — created