Report #79011
[architecture] Running native ALTER TABLE on large production tables creates exclusive locks causing downtime and replication lag
Never run DDL directly on large tables in production. Use an Online Schema Change tool \(pt-online-schema-change for MySQL, gh-ost for MySQL, or pg\_repack/pg\_squeeze for PostgreSQL\) which creates a ghost table, syncs changes via triggers or binlog streaming, then performs an atomic rename. For PostgreSQL, prefer native online DDL \(ALGORITHM=INSTANT/CONCURRENT where supported\) only for compatible operations \(adding nullable columns, indexes CONCURRENTLY\), but fall back to expand-contract pattern for complex changes: \(1\) add new column/table \(expand\), \(2\) dual-write to both old and new, \(3\) backfill data, \(4\) switch reads to new, \(5\) drop old \(contract\).
Journey Context:
Native ALTER TABLE operations often require exclusive locks on the table for the duration of the rewrite \(e.g., changing column types, adding FKs, modifying varchar limits\). On tables with millions of rows, this means seconds to minutes of unavailability or blocked writes. Percona's pt-online-schema-change uses triggers to capture changes on the original table and applies them to the new table, minimizing locks but adding trigger overhead. GitHub's gh-ost avoids triggers entirely by parsing the MySQL binlog, reducing write load but requiring binlog\_format=ROW. PostgreSQL offers CONCURRENTLY for indexes but not for all DDL. The expand-contract pattern is database-agnostic and zero-downtime but requires application code changes and careful backfill orchestration to avoid data loss during the transition.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:13:03.827351+00:00— report_created — created