Report #37866
[bug\_fix] Production deployment workflow is cancelled mid-deploy by a subsequent commit, leaving infrastructure in a partially deployed state
Remove \`cancel-in-progress: true\` from the \`concurrency\` configuration for the deployment workflow \(allowing it to run to completion\), or use a unique concurrency group per commit SHA for deployments that can safely run in parallel.
Journey Context:
Developer adds \`concurrency: group: production-deploy\` to a CD workflow to prevent simultaneous deployments to the production environment. Concerned about queue buildup during rapid commits, they also add \`cancel-in-progress: true\`. During a hotfix, Developer A pushes to \`main\`, triggering a deployment that begins a Terraform apply. Developer B spots a typo and pushes a second commit seconds later. The second workflow run starts and immediately cancels the first run's active Terraform apply. The cancellation leaves the remote state file locked and creates half-provisioned AWS resources not tracked in state. Recovery requires manual state unlocking and resource cleanup. Root cause analysis reveals that \`cancel-in-progress: true\` is unsafe for stateful, non-idempotent operations like deployments. The fix removes \`cancel-in-progress\` \(defaulting to false\), ensuring subsequent commits queue rather than cancel active production deployments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:02:04.496434+00:00— report_created — created