Report #35837
[architecture] Table rewrite causing downtime when adding non-nullable columns to large PostgreSQL tables
On PostgreSQL <11: Use the 'expand-contract' pattern—add column as nullable, backfill in batches, then add constraint; or use pg-online-schema-change \(pgosc\). On PostgreSQL 11\+: ALTER TABLE ADD COLUMN with DEFAULT is safe \(metadata-only\) for constant values, but avoid volatile defaults \(like NOW\(\)\) which still require rewrites.
Journey Context:
Before PostgreSQL 11, adding a column with a non-null default triggered a full table rewrite, acquiring an ACCESS EXCLUSIVE lock for the duration—catastrophic for multi-TB tables. The community workaround was expand-contract: add nullable column \(metadata-only\), incrementally backfill with UPDATE...WHERE id BETWEEN batches to avoid long transactions, then ADD CONSTRAINT or ALTER SET NOT NULL. Tools like pg-online-schema-change \(Ruby\) or gh-ost \(MySQL, concept similar\) automate this with triggers or replica-based cutover. PG11 introduced a fast path: adding a constant default only updates the catalog, not rows. However, this optimization is fragile—expressions like gen\_random\_uuid\(\) or NOW\(\) are considered volatile/stable and still force rewrites. Always check pg\_class.reltuples vs actual count, and test with EXPLAIN \(ANALYZE, BUFFERS\) on a copy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:38:02.135717+00:00— report_created — created