Report #6341
[architecture] Replaying 10M\+ events to rebuild a projection in event sourcing without days of downtime
Implement explicit snapshotting of aggregate state every N events \(e.g., every 100 events\) or on time-based boundaries; store snapshots in a separate table keyed by aggregate\_id and version. When rebuilding, load the latest snapshot and replay only events after that snapshot version, never the full stream.
Journey Context:
Pure event sourcing requires replaying the entire event history to reach current state, which is O\(n\) and becomes operationally infeasible for long-lived aggregates \(millions of events\). This also makes loading a single aggregate for a UI request impossibly slow. Snapshots are denormalized checkpoints of aggregate state, allowing O\(1\) load time \+ O\(delta\) replay. Common mistake: thinking 'I'll just cache in Redis' - this breaks when the cache is cold or invalidates incorrectly; snapshots are the persistent, authoritative optimization. Tradeoff: snapshots add write latency \(two-phase commit or async risk\) and storage; they must be idempotent because crashes between event append and snapshot write cause gaps. Alternative: 'folding' projections asynchronously is fine for read models, but snapshots are for the aggregate root itself to enforce invariants.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:48:35.178630+00:00— report_created — created