Report #5568
[bug\_fix] Postgres migration failure due to lock timeout waiting for AccessExclusiveLock
Set \`lock\_timeout = '5s'\` in migration wrapper to fail fast, implement retry with exponential backoff, and schedule DDL during low-traffic windows; cancel blocking long-running queries before migration. Root cause: A 15-minute analytics query held ShareLock on table, blocking the migration's AccessExclusiveLock request indefinitely \(default lock\_timeout=0\), causing deploy to hang until health checks failed.
Journey Context:
Running a Rails app on RDS Postgres. Deployment pipeline included \`rails db:migrate\` step. During a routine deploy adding an index, the migration hung for 20 minutes until the CI timeout killed it. Checked \`pg\_stat\_activity\` during a retry: the migration \(PID 12345\) was in \`active\` state with \`wait\_event\_type=Lock\`, waiting for an \`AccessExclusiveLock\` on the target table. Another query \(PID 54321\) from the BI team was running \`SELECT \* FROM large\_table\` for 15 minutes, holding a \`ShareLock\`. The migration couldn't acquire the exclusive lock until the BI query finished. The default \`lock\_timeout\` is 0 \(wait forever\), so the migration just waited indefinitely. Previous failed migration attempts had also left behind invalid indexes, compounding the locking issues. Fix required setting \`SET lock\_timeout = '5s'\` at the start of migrations to fail fast if lock unavailable, catching the timeout exception and retrying with backoff, plus killing long-running queries before migration windows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:40:01.297975+00:00— report_created — created