Report #12145
[architecture] Offset pagination causes slow queries and missing/duplicate rows during concurrent writes
Implement keyset pagination \(cursor-based\) using an unambiguous ordered unique column \(or composite tuple\) as the bookmark; ensure the ORDER BY clause includes a unique column \(tie-breaker\) to prevent pagination drift when values are non-unique, and use row-value comparisons for efficient indexing.
Journey Context:
OFFSET/LIMIT scans and discards N rows, resulting in O\(n\) cost that linearly degrades with page depth \(OFFSET 1000000 is prohibitively expensive on large tables\). It also yields duplicates or skips rows if the dataset is modified between page fetches \(e.g., new rows inserted at the top\). Cursor pagination \(WHERE id > last\_seen\_id\) uses an index range scan \(O\(log n\)\), remaining fast at any depth. The subtle trap: ordering by a non-unique column \(e.g., created\_at timestamp\) risks 'tie' rows sharing the same value; a simple 'WHERE created\_at > ?' cursor will skip rows with identical timestamps that appear after the cursor position. The fix is to order by \(created\_at, id\) and use a row-value cursor \(created\_at, id\) > \(?, ?\), ensuring a deterministic sort key. This requires the database to support row-value comparisons \(PostgreSQL, MySQL 8.0\+\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T15:13:36.324033+00:00— report_created — created