Report #99651
[bug\_fix] A single failing matrix job cancels all other matrix jobs before they finish
Set strategy.fail-fast to false on the matrix job. By default GitHub Actions matrix jobs use fail-fast: true, which means as soon as any matrix instance fails, all in-progress and queued instances of that job are cancelled. Setting fail-fast: false lets every matrix instance run to completion so you get the full failure matrix; combine it with continue-on-error on specific experimental matrix entries if you want those failures to be informational rather than failing the workflow.
Journey Context:
A team expands their test job to run on ubuntu-latest, windows-latest, and macos-latest with Node 18 and 20. The first run shows the windows job failing, but before they can see whether macOS or the other Node version is also broken, the Actions UI shows all jobs cancelled with 'The operation was canceled.' They initially think there is a syntax error in the workflow causing a global abort. They re-run the job and now the macOS job fails first, cancelling Windows before it reaches the same error. After checking the docs they discover fail-fast defaults to true for matrix strategies. They add strategy: fail-fast: false and immediately see that only Windows has the failure; the other five combinations pass. They leave fail-fast: true on the main branch to save CI minutes, but disable it temporarily while debugging cross-platform issues.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T04:49:52.774613+00:00— report_created — created