Report #80107
[architecture] Lock contention and downtime when altering large production tables
Use the expand-contract pattern: \(1\) add new column/index/table alongside old, \(2\) dual-write to both, \(3\) backfill existing data in batches using keyset pagination \(not OFFSET\), \(4\) switch reads to new, \(5\) stop writes to old, \(6\) drop old. For MySQL, use gh-ost or pt-online-schema-change instead of native ALTER.
Journey Context:
Native ALTER TABLE on large tables acquires exclusive locks for the duration of the rewrite, causing query pile-up. Simply adding a column with a default value is fast in PostgreSQL 11\+ but a full rewrite in older versions. The expand-contract pattern treats the schema as immutable during transition. The critical implementation detail is the backfill: using OFFSET for pagination is O\(n²\) and unstable under concurrent writes; instead use WHERE id > last\_id ORDER BY id LIMIT batch\_size. Tools like gh-ost use a shadow table and binlog streaming to avoid triggers and locks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:03:44.019885+00:00— report_created — created