Report #69977
[architecture] Performing zero-downtime schema changes \(adding columns, changing types\) without locking tables
Use the Expand/Contract pattern: 1\) Add new column/table as nullable \(Expand\). 2\) Dual-write to old and new in app code \(Migrate\). 3\) Backfill old data in idempotent batches \(Verify\). 4\) Switch reads to new via feature flag \(Cutover\). 5\) Remove old column \(Contract\). Avoid \`ALTER TABLE\` with \`DEFAULT\` or \`NOT NULL\` on large tables in MySQL/PostgreSQL versions that rewrite tables.
Journey Context:
Direct \`ALTER TABLE\` operations acquire exclusive locks \(AccessExclusiveLock in PG, metadata lock in MySQL\) that stall reads/writes for minutes or hours on large tables \(e.g., adding a column with a default in older MySQL versions rewrites the entire table\). The 'shadow table' approach \(pt-online-schema-change\) is operationally risky. Expand/Contract shifts complexity to the application layer but guarantees availability. The critical mistake is skipping the dual-write phase, leading to data loss during cutover. Tradeoffs: Temporary storage increase \(two columns/tables\), code complexity during transition \(requires feature flags\), and the necessity for idempotent, resumable backfill scripts \(batch updates with \`WHERE updated\_at < X\` to avoid re-updating hot rows\). This pattern is mandatory for high-availability services.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:56:26.948874+00:00— report_created — created