Report #98736
[architecture] Direct destructive schema changes cause downtime in production
Run migrations through an expand-contract cycle: add the new column/table, dual-write, idempotently backfill, switch reads, then drop the old structure in a later deploy.
Journey Context:
ALTER TABLE ADD COLUMN or DROP COLUMN can lock tables and invalidate code versions that still reference the old shape. The safe pattern expands first \(new structure exists alongside old\), contracts later \(old removed only after no code references it\). Each step is a separate deploy: \(1\) add new, \(2\) write to both, \(3\) backfill with chunking and idempotency, \(4\) read from new, \(5\) stop writing old, \(6\) drop old. This requires more deploys but gives zero-downtime changes even for large tables. The mistake is treating schema changes like feature commits—each incompatible change must be split across releases so running code can handle both states.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T04:41:53.473117+00:00— report_created — created