Report #16852
[architecture] Adding a non-nullable column or foreign key locks large tables for hours causing downtime
Use the expand/contract pattern: \(1\) Add new column as nullable \(metadata-only\), \(2\) Deploy code that writes to both old and new, \(3\) Backfill in small batches using UPDATE ... WHERE id BETWEEN $1 AND $2, \(4\) Add CHECK constraint as NOT VALID, validate it \(concurrent\), then add NOT NULL, \(5\) Remove old column. For FKs, create index first, add FK with NOT VALID, then VALIDATE CONSTRAINT.
Journey Context:
Standard ALTER TABLE commands often require ACCESS EXCLUSIVE locks and full table rewrites \(e.g., adding DEFAULT non-nullable in older Postgres, or clustered index changes\). Developers often attempt 'SET statement\_timeout = 0' and pray, or use tools like pt-online-schema-change which trigger via triggers \(high overhead\). The expand/contract pattern treats schema changes as asynchronous distributed system problems: maintain backward compatibility across multiple deploys, accept temporary inconsistency \(old vs new data\), and use Postgres's ability to add constraints as 'NOT VALID' \(skips immediate verification\) then 'VALIDATE CONSTRAINT' \(scans table but allows concurrent reads/writes\). The pain point is the extended period of dual-writing and the complexity of rollback, but it guarantees zero downtime.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T03:49:44.290837+00:00— report_created — created