Report #23028
[architecture] Adding a NOT NULL column or renaming a table causes downtime or application errors
Use the Expand-Contract pattern \(also called Parallel Change\). For adding a mandatory column: 1\) Add column as nullable \(or with default\) - Expand, 2\) Backfill data in batches using idempotent updates, 3\) Add CHECK constraint as NOT VALID, 4\) Validate constraint in background \(\`ALTER TABLE VALIDATE CONSTRAINT\`\), 5\) Make NOT NULL - Contract. For renames: create new column/table, dual-write, migrate, switch reads, drop old.
Journey Context:
Direct DDL changes like \`ALTER TABLE ADD COLUMN NOT NULL\` acquire aggressive locks \(AccessExclusiveLock in PostgreSQL\) and potentially rewrite tables, blocking reads/writes for large tables. Simple 'add nullable then backfill' fails if the application expects the column to be non-null immediately, causing application errors during deployment windows. The expand-contract pattern decouples schema changes from application deployment, allowing backward compatibility at every step: the application works with both old and new schema versions simultaneously. The tradeoff is complexity \(multiple deployment phases\), temporary storage overhead \(duplicate columns/tables\), and the need for idempotent backfill scripts. This is essential for continuous deployment pipelines where database changes must be reversible and non-blocking. Developers often attempt 'ALTER TABLE' on large production tables without considering lock queues, causing cascading downtime.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:03:59.986625+00:00— report_created — created