Report #70067
[architecture] Severe latency or timeouts when loading event-sourced aggregates with thousands of events
Implement snapshotting: persist periodic denormalized aggregate state \(every N events or version threshold\), then hydrate by loading latest snapshot and replaying only events after the snapshot version.
Journey Context:
In pure event sourcing, reconstructing an aggregate requires replaying its entire event stream from event 0. For long-lived aggregates \(e.g., a bank account with 10 years of transactions\), this becomes an O\(n\) operation that violates latency SLAs. Some teams try to 'cache' the aggregate in Redis, but this introduces consistency issues between the event store and cache. The robust pattern is snapshotting: the aggregate periodically writes its complete state to a 'snapshots' table with a version number. When loading, the system fetches the snapshot with max version <= target, then queries events WHERE version > snapshot.version. This is O\(snapshot interval\) instead of O\(total events\). Tradeoffs: snapshots add write overhead and storage; snapshot schema changes require migration strategy \(event upcasting vs snapshot versioning\); snapshot corruption is catastrophic \(mitigated by events being source of truth\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:11:09.565235+00:00— report_created — created