Report #13485
[bug\_fix] PostgreSQL migration lock timeout \(statement timeout during DDL\)
Set lock\_timeout='2s' and statement\_timeout='0' for the migration session, and implement a retry loop with exponential backoff to acquire the ACCESS EXCLUSIVE lock; root cause is long-running SELECT queries holding ACCESS SHARE locks, blocking ALTER TABLE which requires ACCESS EXCLUSIVE lock.
Journey Context:
A zero-downtime deployment pipeline running Rails migrations against a production PostgreSQL 15 database hangs indefinitely on an ALTER TABLE ADD COLUMN. The migration process appears frozen. Querying pg\_stat\_activity reveals the migration's PID in state "active" with wait\_event\_type=Lock and wait\_event=relation. Checking pg\_locks shows the migration is waiting for an AccessExclusiveLock on the target table, but it's blocked by three other PIDs holding AccessShareLock \(simple SELECT statements from the live application\). These SELECTs are long-running analytical queries taking 5\+ minutes. The migration cannot proceed until they finish. Previously, the migration used a default statement\_timeout of 30s, but that only cancels the query if it \*acquires\* the lock and runs long; it doesn't help with lock acquisition waiting. The fix involves modifying the migration runner to first execute SET lock\_timeout = '2s'; before the DDL. This causes the ALTER TABLE to cancel itself if it cannot acquire the lock within 2 seconds, throwing a lock\_not\_available error. The migration wrapper catches this, waits with exponential backoff \(2s, 4s, 8s...\), and retries. Once the long-running SELECTs finish, the lock is acquired immediately, and because statement\_timeout is set to 0 \(disabled\), the ALTER TABLE can run for hours if needed without being killed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:50:40.927928+00:00— report_created — created