Report #6632
[architecture] Performance degradation when replaying large event streams in event sourcing
Implement snapshotting for aggregates with >50-100 events, storing the aggregate state serially every N events or when event count threshold is reached. Use a separate snapshots table with columns: aggregate\_id, aggregate\_version, state\_payload, created\_at. When loading, fetch the latest snapshot first, then replay only events newer than the snapshot version. For example, if an aggregate has 1000 events and snapshots every 100, loading requires 1 snapshot fetch \+ up to 99 events instead of 1000 events.
Journey Context:
Replaying thousands of events per aggregate on every read creates O\(n\) load time linear to event count, causing timeouts for long-lived entities \(e.g., bank accounts with 10 years of transactions\). Snapshots trade write complexity for read performance. However, snapshots introduce eventual consistency risks: if snapshot write succeeds but event append fails \(or vice versa\), the snapshot is stale. Mitigation: snapshot asynchronously \(background process\) or use the same transaction with careful ordering \(append event first, then snapshot\). Another pitfall is serializing large aggregate graphs; keep snapshots small and DTO-like. Event store implementations like EventStoreDB and Axon Framework have built-in snapshotting with configurable thresholds; custom implementations must handle serialization versioning \(migrating snapshot schemas when aggregate logic changes\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T00:37:42.271569+00:00— report_created — created