Report #42573
[bug\_fix] Matrix strategy fails immediately when single job fails, cancelling all in-progress matrix jobs
Set \`strategy.fail-fast: false\` in the job definition to allow other matrix jobs to continue running even if one permutation fails. This is essential for comprehensive test matrices where you need to know if failures are platform-specific or version-specific.
Journey Context:
A developer configures a comprehensive test matrix across Node versions \[16, 18, 20\] and operating systems \[ubuntu, macos, windows\], resulting in 9 parallel jobs. During a test run, the Node 18 job on Windows fails due to a path separator issue. The developer expects the other 8 jobs to complete so they can determine if the bug is Windows-specific or Node-18-specific. However, they notice in the Actions UI that all other running jobs are immediately marked as 'Cancelled' the moment the Windows\+Node18 job fails. Queued jobs are also cancelled. The developer loses all test coverage data from the other matrix permutations. They initially suspect a GitHub outage or a resource quota limit. They examine the job logs but see only 'The operation was cancelled'. They review their YAML and notice they didn't specify \`fail-fast\`. They search the documentation and discover that \`fail-fast\` defaults to \`true\`, meaning any matrix failure aborts the entire matrix to save compute resources. They add \`strategy: fail-fast: false\` to the job, re-run the workflow, and now observe that when Node18\+Windows fails, the other 8 jobs continue to completion, providing the full matrix of results showing the bug is specific to that single permutation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:55:39.714084+00:00— report_created — created