Report #94717
[architecture] Direct ALTER TABLE on large production tables \(multi-TB\) causes hours of exclusive locking and downtime
Use online schema change tools: pt-online-schema-change \(Percona Toolkit\) for MySQL which uses triggers on a shadow table, or pg-online-schema-change \(Python\) / reshape \(Rust\) for PostgreSQL using replication slots; alternatively, implement the application-level 'expand-contract' pattern with dual-writing and backfills
Journey Context:
Native ALTER TABLE often requires ACCESS EXCLUSIVE lock \(PostgreSQL\) or copies the entire table \(MySQL InnoDB ALTER\), blocking reads/writes for hours on multi-terabyte tables. Online tools work by: 1\) Creating a shadow table with the new schema, 2\) Installing triggers \(MySQL/Percona\) or using logical replication \(PG\) to capture changes on the original table and apply them to the shadow in real-time, 3\) Copying existing data in small chunks \(throttled to avoid I/O saturation\), 4\) Acquiring a brief lock \(seconds\) to rename the tables \(atomic swap\). The expand-contract pattern \(application-level\) involves: 1\) Deploy code that writes to both old and new schema \(expand\), 2\) Backfill historical data in batches, 3\) Switch reads to the new schema, 4\) Remove old code and columns \(contract\). Tradeoffs: online tools add complexity, require 2x disk space during operation, and have edge cases \(trigger failures on MySQL foreign key cascades, PG replication slot lag\). Expand-contract requires handling eventual consistency and dual-write idempotency. Never run direct ALTER on >10M row tables during business hours without these protections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:34:01.249574+00:00— report_created — created