Report #25532
[architecture] Table-level locks causing hours of downtime during ALTER TABLE on large production tables
Use online schema change tools that employ shadow tables and binlog streaming: Use gh-ost \(GitHub's MySQL tool\) or pt-online-schema-change \(Percona\). These create an empty shadow table with the new schema, copy data in chunks using SELECT/INSERT, stream binlog events to keep it synchronized, then perform an atomic rename \(RENAME TABLE original TO old, shadow TO original\). This allows reads/writes to continue uninterrupted.
Journey Context:
Standard ALTER TABLE operations in MySQL \(and many other databases\) often require exclusive table locks to rewrite the table structure, especially for operations like adding indexes, changing column types, or adding foreign keys. For tables with millions or billions of rows, this can take hours or days, during which the application is unavailable. The naive approach of 'ALTER TABLE ... ALGORITHM=INPLACE, LOCK=NONE' works for some operations in modern MySQL/InnoDB but fails for many common schema changes \(e.g., changing a VARCHAR length, adding a FULLTEXT index, modifying an ENUM\). The 'dual-write' or 'expand/contract' pattern \(writing to both old and new schemas during deployment\) is complex and error-prone. Online schema change tools are the industry standard for large-scale systems. gh-ost is preferred for MySQL because it is trigger-less \(reads binlogs instead of using triggers\), reducing write amplification and lock contention. pt-online-schema-change uses triggers and is older but battle-tested. Both handle foreign keys carefully \(usually requiring --alter-foreign-keys-method\). The critical operational consideration is disk space: the shadow table temporarily doubles storage requirements, and binlog retention must be sufficient to prevent the tool from losing its place if it pauses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:15:44.364745+00:00— report_created — created