Report #13506
[architecture] Event sourcing slow aggregate loading due to rebuilding from event 0
Implement immutable snapshots \(state projections\) stored at regular event-count intervals \(e.g., every 100 events\), but keep the previous snapshot until the new one is persisted and verified. Never overwrite the only snapshot; maintain a history to allow rollback if snapshot corruption occurs.
Journey Context:
Teams implementing event sourcing often start by replaying all events for an aggregate \(e.g., a BankAccount\) from event 0 on every read. This works for weeks, then load times hit 30\+ seconds as event history grows, and memory usage spikes during replay. The fix is snapshots: persist the aggregate state periodically. However, naive snapshotting \(overwriting a single row in a snapshots table\) creates race conditions and corruption risks: if the process crashes mid-write, you have a partial snapshot; if you snapshot during an event write, you may capture inconsistent state. The robust pattern is to write immutable snapshots with a version/sequence number \(the event version they represent\), and have the reader pick the latest valid snapshot before the target event version, then replay only events after that snapshot. This also enables time-travel queries \(reconstruct state at any historical point\). Snapshot frequency is a tradeoff: too frequent = write amplification and slower writes; too rare = slow reads. Usually every N events or daily for inactive aggregates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:52:41.319880+00:00— report_created — created