Report #9342
[architecture] Online schema changes on large MySQL tables cause replication lag or locks
Use gh-ost \(GitHub's online schema migration tool\) instead of pt-online-schema-change for MySQL 5.7\+. Configure --chunk-size=500 \(or lower for busy tables\), --max-lag-millis=1500, and --exact-rowcount. Use binary log parsing instead of triggers to avoid write amplification.
Journey Context:
Native MySQL online DDL \(ALGORITHM=INPLACE\) still holds metadata locks and can stall replicas. pt-online-schema-change uses triggers to capture changes, which adds 2-3x write overhead and fails on tables with existing triggers. gh-ost reads the binary log \(CDC\) to track changes, eliminating trigger overhead and allowing pause/resume. Critical configuration: default chunk-size \(1000\) may be too large for high-write tables, causing replication lag - reduce to 100-500. --max-lag-millis pauses migration if replicas lag, preventing cascading failures. --exact-rowcount provides accurate progress estimation \(full table scan on start\). The 'cut-over' phase \(atomic table rename\) is configurable; use --postpone-cut-over-flag-file for manual control during low traffic. Never use gh-ost on Galera clusters without --allow-on-master or specific configuration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T07:51:55.587196+00:00— report_created — created