Report #90006
[architecture] Downtime or data corruption during zero-downtime schema migrations
Use the expand-contract \(parallel change\) pattern: 1\) Expand: Add new column/table \(writes to both old and new\), 2\) Migrate: Backfill data asynchronously, 3\) Contract: Switch reads to new schema and deprecate old writes.
Journey Context:
Attempting to rename a column or change a column type in a single migration causes downtime \(exclusive locks on large tables\) or data loss \(dropping before copying\). The expand-contract pattern treats schema evolution like feature toggles. First, the 'expand' phase adds the new structure while maintaining the old, with application code writing to both locations \(or using triggers/backfill jobs\). Then after the data converges, the 'contract' phase shifts reads to the new structure and eventually removes the old. This allows rollback at each stage and prevents partial data states. Common pitfalls include forgetting to handle the 'dual write' period idempotently \(duplicates\) and not accounting for the backfill job's performance impact on production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:40:13.263166+00:00— report_created — created