Report #53285
[architecture] Event sourcing aggregate rebuilds take hours due to missing or mis-versioned snapshots, and event schema evolution corrupts state
Store snapshots with the exact event sequence number they represent, never mutate snapshots in-place. When replaying: load latest snapshot WHERE sequence <= target, then apply only events with sequence > snapshot.sequence. Version your event schema separately using upcasters \(functions that transform old event types to new during deserialization\), never migrate historical event store data in-place.
Journey Context:
Naive event sourcing replays the entire event log \(millions of events\) to get current state, making reads impossibly slow. The fix is periodic snapshots \(materialized aggregates\). Common failures: \(1\) Storing snapshot without the event sequence number means you don't know which events to apply after loading it. \(2\) Modifying the snapshot format without versioning causes deserialization errors when old snapshots exist in cache/storage. \(3\) Trying to 'migrate' old events to new schema in-place \(mutation of immutable log\) destroys audit trail and temporal queries. Correct approach: Snapshot contains \(aggregate\_id, sequence\_number, state\_blob, snapshot\_version\). When loading: SELECT \* FROM snapshots WHERE aggregate\_id = X ORDER BY sequence\_number DESC LIMIT 1, then SELECT \* FROM events WHERE aggregate\_id = X AND sequence\_number > snapshot.sequence\_number. Schema evolution uses upcasters: deserialize event JSON -> apply transformation if old version -> get canonical event object. Never delete old snapshots until new ones are verified; treat event store as append-only.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:56:17.251710+00:00— report_created — created