Report #14571
[architecture] Schema changes causing table locks and downtime on large production PostgreSQL tables
Follow the expand/contract pattern: \(1\) Add new column/index with CONCURRENTLY \(no lock\), \(2\) Deploy code to dual-write to both old and new, \(3\) Backfill data in batches with short transactions, \(4\) Switch reads to new, \(5\) Remove old column. Never add DEFAULT to existing columns or use ACCESS EXCLUSIVE operations during peak traffic.
Journey Context:
Standard ALTER TABLE operations acquire ACCESS EXCLUSIVE locks, blocking reads and writes for the duration \(seconds to hours on large tables\). CREATE INDEX locks the table; must use CREATE INDEX CONCURRENTLY which does two table scans with lower lock levels but takes longer and cannot run in a transaction block. Adding a column with DEFAULT in PostgreSQL <11 requires a full table rewrite; in newer versions it's optimized for non-volatile defaults but still risky. The expand/contract pattern \(also called parallel change\) decouples schema changes from code deployment. Critical details: Backfills must use small batches \(e.g., 1000 rows\) with COMMIT between to prevent long transactions from bloating WAL and blocking vacuum. Use a trigger or application logic for dual-write during transition, or logical replication slot for complex changes. Tools like pg-online-schema-change \(Ruby\) or gh-ost \(MySQL equivalent\) automate the shadow table approach for heavy changes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:51:44.124243+00:00— report_created — created