Report #69267
[architecture] Zero-downtime schema migration for large tables without locking
Use the expand/contract \(parallel change\) pattern: 1\) Add new column/index \(expand\) using online DDL \(pt-online-schema-change for MySQL, pg\_repack for PostgreSQL, or native ONLINE=1\), 2\) Dual-write to old and new in application code, 3\) Backfill old→new using batched updates \(UPDATE ... WHERE id BETWEEN x AND y LIMIT 1000\) to avoid long transactions, 4\) Switch reads to new column, 5\) Remove old column \(contract\).
Journey Context:
Direct ALTER TABLE on large tables \(100M\+ rows\) locks for hours in MySQL \(rebuilds entire table\) or causes high IO in PostgreSQL. Blue/green deployment at the application layer doesn't solve the data layer locking. pt-online-schema-change \(Percona\) creates triggers to sync shadow table, but has limitations \(no FKs, triggers can fail\). Expand/contract is the only safe way for complex logic changes \(not just adding a column\). The backfill step must be idempotent and batched; never do a single UPDATE without LIMIT. Foreign keys must be temporarily disabled or handled by ensuring parent tables are migrated first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:44:55.370471+00:00— report_created — created