Report #44455
[architecture] Zero-downtime schema migrations that lock tables, break running code, or lose data during deployment
Use the Expand-Contract \(Parallel Change\) pattern across multiple deploys: \(1\) Expand: Deploy code that writes to both old and new schema \(dual-write\). \(2\) Backfill: Migrate existing data asynchronously without locking \(batched updates\). \(3\) Switch: Deploy code that reads from new schema only. \(4\) Contract: After grace period \(monitoring retention\), deploy code that stops writing to old schema and eventually drops it. Never rename columns in-place; always treat it as delete\+create.
Journey Context:
The naive approach of running \`ALTER TABLE\` during deployment causes table locks \(in MySQL <5.6, Postgres <11 for some alters\) and immediate breaking changes for in-flight requests expecting the old schema. Blue-green deployment alone isn't sufficient if the schema change isn't backward compatible \(e.g., dropping a column the old code still reads\). The Expand-Contract pattern decouples schema changes from code deployment by ensuring the database schema remains compatible with both the current AND previous application version during the rolling update window. Critical implementation details: Dual-write logic must handle failures \(write to old succeeds, new fails\) via idempotency or transactionality; backfills must be batched \(e.g., \`UPDATE ... WHERE id BETWEEN x AND y LIMIT 1000\`\) with sleep delays to avoid lock accumulation; the 'Contract' phase must wait until all deployed code versions from the 'Expand' phase are fully terminated \(considering canary deployments\). Tools like \`gh-ost\` \(MySQL\) or \`pt-online-schema-change\` handle the physical online DDL, but the logical Expand-Contract pattern is required regardless of tool for application-level compatibility. Tradeoffs: Requires 3-4 deploy cycles for 'simple' changes; increases temporary storage \(duplicated columns\); dual-write adds latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:05:12.559893+00:00— report_created — created