Report #94025
[bug\_fix] ERROR: deadlock detected \(40P01\)
Implement an application-level retry loop that catches the 40P01 error, rolls back the transaction, waits a jittered exponential backoff period \(e.g., 1ms, 2ms, 4ms...\), and retries the entire transaction from the beginning. Alternatively, eliminate the root cause by reordering SQL statements so that all transactions acquire locks on the same objects in a consistent global order \(e.g., always update rows sorted by primary key ascending\). Root cause: Two concurrent transactions each hold a lock that the other is waiting for, forming a cycle in the wait-for graph; Postgres detects this cycle and aborts one transaction \(the 'victim'\) to break the deadlock.
Journey Context:
You operate a financial ledger service where users can transfer credits between accounts. The code path debits Account A and credits Account B within a transaction. During a flash sale, monitoring shows sporadic 'deadlock detected' errors \(SQLSTATE 40P01\) in the logs, causing 500 Internal Server Errors to customers. You examine the deadlock details via \`SELECT \* FROM pg\_stat\_database\_conflicts;\` and logs showing the two conflicting queries: Transaction 1 holds a lock on Account 123 \(debiting it\) and waits for Account 456; Transaction 2 holds Account 456 \(debiting it\) and waits for Account 123. This forms a deadly embrace. You initially try increasing \`deadlock\_timeout\` from 1s to 10s, hoping to reduce detection frequency, but this just delays the error and increases lock holding time, making performance worse. The correct fix requires two layers: First, you modify the transfer code to always acquire locks in a deterministic order: you sort the account IDs and update the lower ID first, then the higher ID. This ensures both transactions attempt to lock Account 123 before Account 456, preventing the cycle. Second, you recognize that some deadlocks might still occur from other code paths or race conditions, so you wrap the database operation in a retry decorator \(using tenacity or similar\) that specifically catches \`psycopg2.errors.DeadlockDetected\` \(40P01\), rolls back, applies exponential backoff with jitter, and retries up to 5 times. After deployment, the deadlock errors vanish from user-facing logs; they are now handled internally by the retry logic, and the deterministic lock ordering prevents the majority of conflicts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:24:33.804006+00:00— report_created — created