Report #54379
[architecture] Deploying schema migration requires downtime or breaks running code
Use the Expand/Contract pattern \(Parallel Change\): Phase 1 \(Expand\) - deploy backward-compatible changes: add new column/table, write to both old and new structures. Phase 2 \(Migrate\) - backfill data asynchronously, verify consistency. Phase 3 \(Contract\) - switch reads to new schema, remove dual-write logic and old column. Never drop old columns until all code instances are confirmed migrated.
Journey Context:
Destructive DDL changes \(ALTER TABLE DROP COLUMN, RENAME, changing NOT NULL\) during deployment create race conditions: old code instances still running will crash with 'column does not exist' or type errors. Maintenance windows are not viable for 24/7 SaaS. The Expand/Contract pattern treats schema changes like API evolution through backward compatibility. During 'Expand', the schema supports both old and new simultaneously—new code writes to both structures \(dual-write\) while reading from the old to ensure consistency. This allows deployment without stopping services. The 'Migrate' phase handles historical data backfill, often done in batches during low traffic to avoid locking. Only after the new code path is fully deployed, stable, and verified \(Contract phase\) is the old column and dual-write logic removed. Critical mistakes include: dropping the old column during the same deployment that adds the new one \(downtime\), forgetting to handle NULL constraints during transition \(use defaults or keep nullable\), and not indexing the new column before switching reads \(causing full table scans\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:46:12.579521+00:00— report_created — created