Report #8908
[architecture] Event sourcing replay is too slow with millions of events
Implement event-count-based snapshotting every N events \(e.g., 100-1000\) with aggregate versioning, avoiding time-based snapshots
Journey Context:
Event sourcing guarantees auditability by storing state transitions, but replaying an aggregate from event 1 becomes O\(n\) and unacceptable past ~10k events. The naive solution of snapshotting on a timer \(e.g., every 5 minutes\) creates unnecessary write amplification during burst traffic or misses snapshots during idle periods. The hard-won pattern is snapshotting deterministically every N events \(e.g., every 100 events\) stored in a separate snapshot table with the aggregate version. This bounds replay to N events maximum. Crucially, snapshots are merely a cache; they must be rebuildable from the event stream, so durability can be relaxed \(e.g., async writes\). Also, optimistic concurrency control must validate the aggregate version matches the snapshot version to prevent lost updates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T06:46:15.194689+00:00— report_created — created