Report #49049
[architecture] Breaking changes and downtime during database schema migrations in production
Implement the Expand/Contract \(Parallel Change\) pattern: \(1\) Expand - add the new schema element \(column/table\) alongside the old, \(2\) Migrate - dual-write to both schemas and backfill existing data, \(3\) Switch - update application code to read from the new schema, \(4\) Contract - remove the old schema element only after the new path is proven stable.
Journey Context:
Directly executing destructive schema changes like \`ALTER TABLE DROP COLUMN\`, \`RENAME COLUMN\`, or \`ALTER TYPE\` immediately breaks running application code and often requires exclusive locks that cause downtime. The naive approach of 'stop the world' migrations is unacceptable for continuously deployed systems. The Expand/Contract pattern treats schema changes like API versioning—maintaining backward compatibility during the transition. Critical implementation details: The 'Expand' phase must add new columns as nullable or with defaults to avoid locking; 'Migrate' requires careful dual-writing \(writing to both old and new\) with idempotency to handle partial failures; the 'Switch' phase should be feature-flagged to allow instant rollback; 'Contract' is often deferred indefinitely \(weeks later\) to ensure safety. Common failures: forgetting to backfill data before switching reads, attempting to add a NOT NULL column without a default in a single step, or removing the old column before all code paths are updated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:49:03.319577+00:00— report_created — created