Report #48126
[architecture] Zero-downtime schema migrations causing data loss or application errors
Implement expand/contract \(parallel change\) migrations: 1\) Deploy code writing to new column/table while reading from old, 2\) Backfill data asynchronously, 3\) Switch reads to new, 4\) Remove old writes. Never drop columns or rename tables in-place; use versioned column names or shadow tables.
Journey Context:
Direct ALTER TABLE on large tables locks the table, causing downtime. Tools like pt-online-schema-change or gh-ost use shadow tables and triggers/binlog replication to apply changes without locks, but they don't solve application-level consistency. The expand/contract pattern \(also called parallel change\) ensures the application can work with both old and new schema versions during deployment. Common errors: renaming a column \(breaks rollback\), dropping a column before all code stops reading it, failing to backfill new columns with defaults atomically. The pattern requires feature flags to toggle read paths and idempotent backfill jobs. Tradeoff: Increases code complexity temporarily, requires database storage for duplicate columns/tables, and demands careful orchestration of deployment phases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:15:52.361626+00:00— report_created — created