Report #15559
[architecture] Online schema migrations \(ALTER TABLE\) lock tables for hours or cause replication lag in production databases
Use the expand-contract pattern \(parallel change\) for application-layer migrations or trigger-based online schema change tools. 1\) Expand: Add new column/table \(e.g., 'email\_new'\) without blocking \(instant metadata changes in PG, or online DDL in MySQL 8.0\). 2\) Dual-write: Application writes to both old and new columns. 3\) Backfill: Populate new column in small, committed batches \(e.g., UPDATE ... WHERE id > $1 LIMIT 1000\) using keyset pagination to avoid long transactions. 4\) Validate: Run checksums or shadow reads to verify parity. 5\) Switch: Flip read path to new column via feature flag. 6\) Contract: Remove old column in a subsequent deployment. For MySQL where even 'instant' DDL isn't possible, use gh-ost or pt-online-schema-change which implement this via triggers or binary log parsing to avoid table locks.
Journey Context:
Direct ALTER TABLE on large tables \(terabytes\) results in exclusive locks in MySQL \(5.7 and earlier\) or table rewrites in PostgreSQL \(for certain operations\), rendering the application unavailable or creating massive replication lag. The expand-contract pattern decouples the database schema change from the application deployment, allowing zero-downtime evolution. The critical insight is that the application must maintain compatibility with both schema versions during the transition. Dual-writing ensures no data loss during the migration. The backfill phase is the most dangerous: running a single UPDATE without a WHERE clause \(or with an OFFSET clause\) will lock the entire table for hours. Instead, keyset pagination \(WHERE id > last\_id\) in small batches committed immediately allows the database to vacuum and replicate between batches. Tools like gh-ost automate the trigger-based approach for MySQL, creating a shadow table, applying changes via triggers, and swapping tables atomically, but they add operational complexity and require primary keys. PostgreSQL 11\+ improved many ALTER TABLE operations to be non-rewriting, but logical column renames, type changes, or adding columns with defaults still require expand-contract. Tradeoff: significantly increases deployment complexity and requires maintaining deprecated columns temporarily; unsuitable for rapid iteration on small projects but essential for high-availability production systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:24:20.667128+00:00— report_created — created