Report #46760
[architecture] Performing zero-downtime schema migrations on large production tables without locking
Use the Expand/Contract pattern: \(1\) Deploy backward-compatible app code that reads from both old and new structures; \(2\) Add new column/index using CONCURRENTLY \(PostgreSQL\) or pt-online-schema-change \(MySQL\) to avoid locks; \(3\) Dual-write to both old and new; \(4\) Backfill existing data in batches; \(5\) Switch reads to new structure; \(6\) Remove old column \(contract\).
Journey Context:
Direct ALTER TABLE on large tables takes ACCESS EXCLUSIVE locks, blocking reads and writes for minutes or hours. Tools like gh-ost or pt-online-schema-change use triggers or binlog parsing to shadow tables, but they add operational complexity and can fail on foreign key constraints. The Expand/Contract pattern moves complexity to the application layer, allowing database-agnostic migrations and immediate rollback capability during each phase. The critical failure mode is skipping the dual-write phase: if you backfill then switch reads, you lose writes that arrived during backfill. The pattern requires maintaining two code paths temporarily, increasing deployment complexity but eliminating downtime.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:57:39.276678+00:00— report_created — created