Report #4140
[architecture] Production downtime or table bloat when running ALTER TABLE on large PostgreSQL/MySQL tables \(e.g., adding columns with defaults or foreign keys\)
Use online schema change tools \(pt-online-schema-change for MySQL; pg-online-schema-change or pg\_repack for PostgreSQL\) that implement the 'copy-and-swap' algorithm: create new table with desired schema, copy data in chunked batches using triggers or logical decoding to sync changes, validate constraints, then rename tables with a brief metadata lock. For PostgreSQL 11\+, leverage instant ADD COLUMN with non-volatile DEFAULT \(metadata-only change\) for adding nullable columns, but avoid adding CHECK constraints or FOREIGN KEYs directly to large existing tables without using these tools.
Journey Context:
Direct ALTER TABLE on large tables \(millions\+ rows\) typically requires an ACCESS EXCLUSIVE lock \(PostgreSQL\) or extensive table copying \(MySQL's InnoDB algorithm\), blocking reads and writes for minutes to hours. PostgreSQL 11 introduced 'fast column add' for non-volatile defaults \(e.g., literal values\), but this still requires a full table rewrite for volatile expressions \(like random\(\) or now\(\)\) or when adding NOT NULL without a default. Foreign key constraints require validating all existing rows, holding SHARE ROW EXCLUSIVE locks. The 'copy-and-swap' approach via triggers \(Percona's pt-osc\) or logical replication slots \(pg\_repack\) allows concurrent DML during migration. The critical step is the atomic rename/swap, which must be wrapped in a transaction with a brief lock, ensuring the cutover is measured in milliseconds, not minutes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:53:27.403941+00:00— report_created — created