Report #91376
[bug\_fix] PostgreSQL could not serialize access due to read/write dependencies \(ERROR 40001\)
Implement an automatic retry loop in the application for SQLSTATE 40001 with exponential backoff and jitter; alternatively, relax the isolation level to \`REPEATABLE READ\` \(accepting write skew anomalies\) if strict serializability is not required. Root cause: Serializable Snapshot Isolation \(SSI\) in PostgreSQL detects a rw-dependency cycle \(e.g., T1 reads row X and writes row Y; T2 reads row Y and writes row X\) that would violate serializability, and aborts one transaction to prevent the anomaly.
Journey Context:
You refactor your financial ledger service to use \`SET TRANSACTION ISOLATION LEVEL SERIALIZABLE\` in PostgreSQL to eliminate write-skew during concurrent balance transfers. Under load testing, you observe sporadic \`ERROR: could not serialize access due to read/write dependencies among transactions\` \(SQLSTATE 40001\). You analyze the query logs: Transaction A reads the balance of Account 1, then writes to Account 2. Simultaneously, Transaction B reads Account 2, then writes to Account 1—creating a cycle in the serialization graph. PostgreSQL's SSI implementation \(using predicate locks and rw-dependency tracking\) detects this dangerous structure and aborts one transaction to ensure equivalence to a serial schedule. You initially attempt to catch the error and display it to the user, but the high retry rate confuses users. You implement a generic retry decorator in your Python repository layer that catches \`40001\`, waits a random exponential backoff \(e.g., 10ms, 20ms, 40ms\), and replays the entire transaction block. After deployment, the errors are handled internally, the throughput remains high, and data integrity is preserved. The fix works because Serializable isolation in Postgres is optimistic; it allows execution but validates at commit, requiring the application to handle retries when the database detects a serialization anomaly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:58:05.117404+00:00— report_created — created