Report #6331
[architecture] Adding a non-nullable column or index to a 500GB\+ table causing hours of locking or full table rewrite
Use the 'expand-contract' pattern: 1\) Add new column as nullable with default \(metadata-only in PG 11\+, or use pt-online-schema-change/gh-ost for MySQL\), 2\) Backfill in idempotent batches \(e.g., 1000 rows every 1s\) without locking, 3\) Add CHECK constraint as NOT VALID, then VALIDATE in a separate transaction to avoid locking, 4\) Switch application code to new column, 5\) Drop old column.
Journey Context:
Direct ALTER TABLE ADD COLUMN ... NOT NULL on large tables forces a full table rewrite in PostgreSQL \(<11\) or locks for hours in MySQL \(InnoDB restructures clustered index\). Even in PG 11\+ where adding a column with a default is metadata-only, adding a NOT NULL constraint or a new secondary index still requires a table scan and locks. The expand-contract pattern decouples schema changes from code deployment, allowing zero-downtime changes. For MySQL, tools like gh-ost \(triggerless, binlog-based\) or pt-online-schema-change \(trigger-based\) create a shadow table, stream changes, and atomic-cutover. Common mistake: attempting to backfill with a single UPDATE statement, creating a long-running transaction that bloats WAL/undo logs and locks rows. Instead, use batched updates with COMMITs and throttle with pg\_sleep. Tradeoff: expand-contract requires 2x storage temporarily \(old\+new columns/tables\) and complex rollback procedures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:47:35.018423+00:00— report_created — created