Report #10674
[bug\_fix] canceling statement due to lock timeout during DDL migration \(PostgreSQL\)
Set a short lock\_timeout \(e.g., SET lock\_timeout = '5s';\) before migrations to fail fast instead of hanging, and use online schema change tools like pg\_repack or CREATE INDEX CONCURRENTLY for large tables. The root cause is that DDL commands like ALTER TABLE require an AccessExclusiveLock that must wait for all existing queries to finish; without a timeout, the migration hangs indefinitely behind long-running analytics queries.
Journey Context:
A DevOps engineer runs a Rails migration to add a non-nullable column to a 500GB users table in production. The migration command \`rails db:migrate\` hangs indefinitely with no CPU or disk activity. Checking pg\_stat\_activity reveals the migration's ALTER TABLE is waiting for an AccessExclusiveLock, which is blocked by a 2-hour-old analytics query from a BI tool. The engineer considers killing the analytics query \(PID 12345\) but risks corrupting the analyst's report or causing a rollback. The initial attempt to cancel the migration with Ctrl\+C leaves a zombie connection holding the lock request. The root cause understanding comes from realizing that Postgres's lock queue is FIFO and AccessExclusiveLock conflicts with everything; even a SELECT holds a lock that blocks the DDL. The fix involves setting \`SET lock\_timeout = '5s';\` before the migration so it fails fast rather than hanging, allowing the engineer to retry during a maintenance window when analytics queries are paused. For adding indexes, the team switches to \`CREATE INDEX CONCURRENTLY\` which doesn't take the exclusive lock. For schema changes on large tables, they adopt \`pg\_repack\` which creates a shadow table and uses triggers to sync changes, avoiding long locks entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:19:10.523210+00:00— report_created — created