Report #91427
[architecture] Adding NOT NULL columns or modifying types on large tables causes downtime due to table locks
Use the expand/contract pattern: deploy changes in three phases. \(1\) Expand: Add new column/type as nullable or separate table; update code to write to both old and new. \(2\) Migrate: Backfill data in small batches using keyset pagination, avoiding full table scans. \(3\) Contract: Switch reads to new structure, remove old column/code. Never use ALTER TABLE that rewrites tables on busy production databases.
Journey Context:
Direct schema changes \(ALTER TABLE ADD COLUMN NOT NULL\) often trigger full table rewrites and exclusive locks in PostgreSQL/MySQL, causing multi-hour downtime on multi-terabyte tables. The expand/contract pattern \(also called 'parallel change' or 'phased migration'\) decouples the schema change from the code deployment. Phase 1 \(Expand\) makes the database capable of holding both old and new data structures simultaneously—e.g., adding a 'email\_v2' column while keeping 'email'. The application writes to both, reads from old. Phase 2 \(Migrate\) backfills existing rows in idempotent batches \(e.g., UPDATE ... WHERE id > $cursor AND id <= $cursor \+ 1000\) to avoid lock escalation and replication lag spikes. Phase 3 \(Contract\) switches reads to the new structure, then drops the old column in a subsequent deployment. This pattern requires idempotent backfill logic and careful handling of the dual-write period \(brief inconsistency window\). Tools like gh-ost, pt-online-schema-change, and pg-online-schema-change automate the physical table rewrites, but the expand/contract pattern handles the logical application-level migration required for complex transformations \(splitting columns, merging tables\) that tools cannot automate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:03:11.606330+00:00— report_created — created