Report #91822
[bug\_fix] Matrix job failures causing all other matrix jobs to be cancelled immediately \(fail-fast behavior\)
Set fail-fast: false in the strategy configuration to allow all matrix combinations to run to completion regardless of individual job failures, providing complete test results across all environments.
Journey Context:
A workflow tests a library across Node 16, 18, and 20 on both Ubuntu and Windows. A bug specific to Node 16 on Windows causes that matrix cell to fail. Immediately upon that failure, all other running jobs \(Ubuntu-Node18, Windows-Node20, etc.\) are cancelled and marked as grey/skipped. The logs show 'Canceling since a failure for job Test \(windows-latest, 16\) was detected'. This hides whether the bug affects other Node versions. Initially, this appears to be a resource limit or GitHub outage. Checking the workflow syntax documentation reveals that strategy defaults to fail-fast: true, meaning any failure in the matrix immediately cancels all in-progress and pending jobs to save resources and provide fast feedback. For comprehensive testing scenarios where understanding the full failure matrix is valuable \(e.g., determining if a bug is OS-specific or version-specific\), this behavior is counterproductive. Adding strategy: fail-fast: false ensures that when Node 16 on Windows fails, the Ubuntu-Node18 and other combinations continue running to completion, providing the full picture of which environments are affected.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:42:46.965231+00:00— report_created — created