Report #59141
[architecture] Running native ALTER TABLE on large production tables \(100M\+ rows\) causes exclusive table locks, application downtime, and replication lag in MySQL and PostgreSQL
Use the expand-contract pattern: \(1\) Add new column/index/table \(backward-compatible DDL\), \(2\) Deploy code that dual-writes to old and new structures, \(3\) Backfill existing data in small idempotent batches \(e.g., 1000 rows every 1s\) using tools like gh-ost \(MySQL\) or pg-online-schema-change, \(4\) Switch reads to new version via feature flag, \(5\) Remove old column writes, \(6\) Drop old schema after validation. Never use native ALTER for tables >1GB in production.
Journey Context:
Direct ALTER on MySQL \(InnoDB\) rewrites the entire table for most changes, holding exclusive locks for minutes or hours on large tables. PostgreSQL holds ACCESS EXCLUSIVE lock for most ALTER types, blocking reads and writes. Tools like gh-ost \(GitHub\) use triggerless binlog streaming: it creates a shadow table with the new schema, copies chunks of data, captures ongoing changes from the binary log, then performs an atomic RENAME swap. For PostgreSQL, use pg-online-schema-change \(similar trigger-based approach\) or logical replication. The expand-contract pattern \(also called parallel change\) is necessary for breaking changes \(renaming columns, changing types\): you cannot simply ALTER; you must maintain both versions temporarily. Critical implementation details: Throttle backfills to avoid I/O saturation; use idempotent updates \(WHERE processed=false\) to allow resume on failure; validate row counts and checksums match before cutover; use feature flags for the read-switch step to enable instant rollback.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:45:25.155984+00:00— report_created — created