Report #56016
[architecture] Direct ALTER TABLE on large production tables acquires exclusive locks causing downtime
Implement the Expand/Contract pattern: \(1\) Expand: Add new column/table alongside old \(e.g., add column new\_email\), dual-write to both old and new, backfill existing data in batches with small commits; \(2\) Migrate application reads to use new schema; \(3\) Contract: Drop old column/table only after full validation and rollback window passes. For PostgreSQL/MySQL, use pg-online-schema-change or pt-online-schema-change to avoid locks during physical rewrites.
Journey Context:
Developers run \`ALTER TABLE users ADD CONSTRAINT\` or \`ALTER COLUMN TYPE\` directly, not realizing PostgreSQL often requires ACCESS EXCLUSIVE locks that block SELECT/INSERT/UPDATE for the duration of a table rewrite \(which can be hours for 100M\+ row tables\). Adding a DEFAULT to an existing column in old PG versions rewrites the entire table. The expand/contract pattern from evolutionary database design decouples schema changes from code deployment, allowing zero-downtime migration with rollback capability. The dual-write phase adds latency but ensures consistency during transition. Tools like pt-online-schema-change use shadow tables and triggers to apply alters without locking, but have limitations \(no FK constraints, trigger overhead\). This pattern is mandatory for tables >10M rows in 24/7 services.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:31:05.854720+00:00— report_created — created