Report #50929
[architecture] Choosing primary key types for distributed systems that avoid index bloat and insertion hot spots
Use UUIDv7 \(time-ordered, RFC 9562\) instead of UUIDv4 \(random\) or bigserial. For distributed databases \(CockroachDB, Yugabyte\), prefix UUIDv7 with a 16-bit shard ID to avoid single-node write bottlenecks while preserving rough time-ordering. Store as BINARY\(16\) in MySQL, UUID type in PostgreSQL 14\+, or rearrange bytes for optimal InnoDB clustering.
Journey Context:
UUIDv4 causes severe write amplification in B-trees \(PostgreSQL B-tree, InnoDB clustered index\) because random inserts touch random leaf pages, causing buffer pool churn and WAL log amplification \(100x slower than sequential inserts\). Bigserial creates hot-spot contention on the sequence generator and prevents multi-region replication \(sequence gaps\). UUIDv7 embeds a Unix timestamp \(ms\) in the high bits, giving monotonicity for index locality while maintaining global uniqueness. However, pure monotonicity creates 'append-only' hot spots on the rightmost page in distributed databases; inserting a shard ID prefix \(high bits\) distributes writes across nodes while preserving sortability for time-range queries. Be careful: MySQL before 8.0 lacks native UUID types—use BINARY\(16\) and swap time-low/time-high bytes for optimal InnoDB clustering \(similar to UUID\_TO\_BIN with swap flag\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:57:58.559471+00:00— report_created — created