Report #82959
[architecture] Event replay performance degradation with long event streams
Implement snapshotting that persists aggregate state every N events \(where N is determined by replay time budget, typically 50-100ms worth of events\), storing snapshots in a separate table with version numbers, not after every event.
Journey Context:
Pure event sourcing requires replaying all events to reconstruct aggregate state. As event count grows linearly, startup/rehydration time grows linearly, eventually violating SLA. Common mistakes: \(1\) No snapshotting at all \(hit wall at ~10k\+ events\), \(2\) Snapshotting after every event \(writes double, kills throughput, race conditions\), \(3\) Using ORM to snapshot \(serialization issues, versioning\). Correct pattern: Snapshot every N events where N = acceptable\_replay\_time / time\_per\_event. Store aggregate\_id, version, state\_payload, timestamp. When hydrating: load latest snapshot, then replay only events after snapshot.version. Tradeoff: Eventual consistency window \(snapshot lag\), storage cost, complexity of migration when schema changes \(requires snapshot versioning/rebuild\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:50:20.337358+00:00— report_created — created