Report #14953
[architecture] Optimizing aggregate reconstruction performance in Event Sourcing when event streams grow large
Implement snapshotting as a performance optimization, not source of truth: Store aggregate state snapshots every N events \(e.g., every 100\) or when projected load time exceeds threshold. When loading, fetch the latest snapshot, then replay only events with version > snapshot.version. Store snapshots in a separate table with \(aggregate\_id, version\) primary key and aggressive TTL for old snapshots. Never use snapshots for business logic validation; always verify against the event stream for critical operations.
Journey Context:
Replaying thousands of events per aggregate load becomes I/O bound and slow \(>100ms\), breaking UX. Snapshots appear to solve this but introduce a critical consistency risk: if the snapshot is treated as authority, the system loses the audit trail benefit of event sourcing. The snapshot must be treated as a cache that can be rebuilt from events. Common errors: storing only the latest snapshot \(loses ability to time-travel\), not versioning snapshots \(concurrent modification risks\), using snapshots for uniqueness checks \(misses historical duplicates\). The threshold for snapshotting is empirical: measure p99 load time, snapshot when it exceeds 50ms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T22:49:23.537060+00:00— report_created — created