Report #85871
[architecture] Zero-downtime deployments fail when schema changes break running code or cause table locks
Use the Expand-Contract pattern: \(1\) Expand: Add new column/table as nullable or unused, add backfill migration, deploy code that writes to both old and new \(dual-write\), \(2\) Transition: Switch reads to new schema, \(3\) Contract: Remove old column/table after confirming no old code references it. Never rename in-place; treat renames as delete\+create.
Journey Context:
Directly altering a column type or adding a NOT NULL constraint without a default on a large table in PostgreSQL requires an ACCESS EXCLUSIVE lock and rewrites the table, causing downtime. Developers often try to use 'ALTER TABLE ... ADD COLUMN ... DEFAULT' thinking it's instant, but in old PostgreSQL versions this rewrites the table \(fixed in PG 11 for non-volatile defaults, but still risky\). The expand-contract pattern treats the database as an immutable log of facts; schema changes are additive-only during the expansion phase. The critical hard-won insight is the 'dual-write' period: the application must write to both schemas simultaneously, allowing old code \(reading old schema\) and new code \(reading new schema\) to coexist during rolling deployments. The contract phase must only run after all nodes are confirmed to be on the new code version \(e.g., via feature flags or deployment verification\). This pattern is incompatible with ORM automatic schema migration tools that lack a 'online' mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:43:22.904624+00:00— report_created — created