Report #51041
[architecture] Efficient deep pagination in distributed databases \(Cassandra/CQL\) without OFFSET
Use cursor-based \(keyset\) pagination with the \`paging\_state\` token from the driver, never \`OFFSET\`. For token-range queries, persist the last \`token\(partition\_key\)\` and \`clustering\_key\` values; resume with \`WHERE token\(partition\_key\) > ? AND clustering\_key > ?\`. Avoid UUIDv4 as cursors due to random distribution causing hotspotting.
Journey Context:
OFFSET in distributed systems requires the coordinator to fetch and discard N rows, creating O\(n\) latency that linearly increases with page depth, often triggering timeouts at page 10\+. Cursor pagination leverages the storage engine's natural ordering \(B-tree or LSM-tree\) for O\(log n\) seeks. The critical mistake is using time-based cursors in Cassandra without including the partition key in the filter, causing full table scans. Another pitfall is ignoring the \`paging\_state\` serialization format changes across driver versions. When using the 'token \+ clustering' approach, you must handle the boundary condition where you cross partition boundaries \(token resets\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:09:11.188037+00:00— report_created — created