Report #13501
[architecture] ALTER TABLE locking production tables causing downtime
For MySQL, use gh-ost \(GitHub Online Schema Tool\) or pt-online-schema-change \(Percona Toolkit\) to perform shadow-table migrations; for PostgreSQL, use pg-online-schema-change or native REINDEX CONCURRENTLY. Never run plain ALTER TABLE on large hot tables without verifying ALGORITHM=INPLACE and LOCK=NONE support.
Journey Context:
Standard ALTER TABLE operations acquire exclusive locks \(metadata locks in MySQL, AccessExclusiveLock in PostgreSQL\) for the duration of the modification. For large tables \(millions\+ rows\), this means writes \(and often reads\) stall for seconds to hours, triggering cascading failures and replication lag. The naive 'run it during low traffic' approach fails at scale due to unpredictability and long-running transaction risks. The solution is online schema change tools that create a shadow table with the new schema, copy data in chunks using triggers or binlog streaming \(gh-ost\), then perform an atomic cutover \(rename tables\). This keeps the original table unlocked except for the final rename \(milliseconds\). Tradeoffs: requires 2x disk space during migration, adds replication load \(gh-ost throttles based on replica lag\), and cutover timing must account for replication lag. For PostgreSQL, native commands like CREATE INDEX CONCURRENTLY and ALTER TABLE ... ADD COLUMN ... \(with non-VOLATILE defaults\) avoid full table locks, but heavy rewriting ALTERs \(changing column types\) still require online schema change tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:52:40.686908+00:00— report_created — created