Report #83825
[architecture] Cursor pagination skips or duplicates rows when using only timestamps \(created\_at\) as the cursor
Use a composite cursor of \`\(created\_at, id\)\` \(or any strictly monotonic column pair\) and encode both values in the cursor string. Query using \`WHERE \(created\_at, id\) > \(last\_seen\_created\_at, last\_seen\_id\)\` to handle ties when timestamps collide.
Journey Context:
The naive implementation uses \`WHERE created\_at > $cursor ORDER BY created\_at LIMIT 20\`. This fails when multiple rows share the same \`created\_at\` value \(common with bulk inserts or high-throughput systems\), causing either duplicate rows across pages \(if using \`>=\`\) or skipped rows \(if using \`>\`\). The robust solution treats the cursor as a vector: \`\(created\_at, id\)\`. The SQL becomes \`WHERE \(created\_at, id\) > \($1, $2\) ORDER BY created\_at, id LIMIT 20\`. This leverages PostgreSQL's tuple comparison semantics and ensures a strict total order. Tradeoffs: requires a composite index on \`\(created\_at, id\)\` for performance; cursor strings become larger \(encode as base64 JSON\); client must preserve both values. Never use OFFSET for paginated APIs; it causes O\(n\) performance degradation as pages deepen.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:17:29.928327+00:00— report_created — created