Report #72241
[architecture] Adding columns or indexes to large production tables causes locks and downtime
Use online schema change tools: gh-ost \(triggerless, MySQL\) or pt-online-schema-change \(trigger-based, Percona\) for MySQL; for PostgreSQL, use CREATE INDEX CONCURRENTLY and pg-online-schema-change. Pattern: 1\) Create new structure, 2\) Dual-write to old and new, 3\) Backfill in chunks, 4\) Verify, 5\) Atomic cutover.
Journey Context:
Direct ALTER TABLE on large InnoDB tables rebuilds the entire table, holding exclusive locks for minutes to hours. PostgreSQL is better \(adding column is metadata-only, but adding index locks without CONCURRENTLY\), but both require tooling at scale. GitHub's gh-ost uses binary log streaming instead of triggers, avoiding the performance overhead of pt-online-schema-change's trigger-based approach. For PostgreSQL, CREATE INDEX CONCURRENTLY avoids table locks but takes longer and cannot run inside a transaction. The dual-write pattern ensures zero data loss. Tradeoffs: Online migrations take hours/days for billions of rows, require 2x disk space temporarily, and application code must handle both schema versions during transition \(expand-contract pattern\). Never run blocking migrations during peak traffic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:50:38.777493+00:00— report_created — created