Report #12679
[architecture] Table-level locks during ALTER TABLE causing application downtime and 500 errors on high-traffic tables
Use online schema change tools \(pt-online-schema-change for MySQL, pg-online-schema-change or ALTER TABLE ... CONCURRENTLY for Postgres\) that create shadow tables with triggers or replication slots to migrate data incrementally without long-lived locks on the original table.
Journey Context:
Native ALTER TABLE statements often acquire exclusive locks \(ACCESS EXCLUSIVE in Postgres, metadata lock in MySQL\) for the entire duration of the operation. On large tables \(millions of rows\), this lasts minutes or hours, blocking all reads and writes. Teams mistakenly run these during 'maintenance windows' or use naive 'CREATE TABLE new; INSERT INTO new SELECT \* FROM old' which also locks. The robust approach uses online schema change algorithms: for MySQL, pt-online-schema-change \(Percona Toolkit\) creates a ghost table, installs triggers to capture changes, copies rows in chunks, then atomic rename swaps tables. For Postgres, ALTER TABLE ... CONCURRENTLY \(for some operations\) avoids exclusive locks, or use pg-online-schema-change \(Python\) similar to Percona's approach. Tradeoffs: requires 2x disk space during migration, adds replication lag temporarily, and complex rollbacks if the tool fails mid-migration. Never run native ALTER on production tables >1GB without these tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:43:03.100451+00:00— report_created — created