Report #6154
[bug\_fix] canceling statement due to conflict with recovery on read replica
Set \`hot\_standby\_feedback = on\` in postgresql.conf on the standby/replica server. Alternatively, increase \`max\_standby\_streaming\_delay\` to -1 \(infinite delay\) if accepting replication lag is preferable to canceling queries. Root cause is that long-running queries on the hot standby conflict with WAL replay from the primary. When the primary drops a table or updates rows being read by the standby's query, the standby must either delay WAL replay \(causing lag\) or cancel the query to apply the change.
Journey Context:
Analytics dashboard connected to PostgreSQL read replica randomly fails with 'ERROR: canceling statement due to conflict with recovery'. The error occurs only on long-running reports \(5\+ minutes\) and only during high write load on the primary database. Checking replica logs reveals 'User query has been canceled to resolve recovery conflict' alongside 'cleanup lock conflict' messages. Investigating pg\_stat\_database\_conflicts on the replica shows high 'confl\_snapshot' and 'confl\_bufferpin' counts. Initially considering moving analytics to the primary, but that would overload the OLTP system. Researching hot standby documentation reveals that by default, the replica prioritizes applying WAL over query consistency, canceling any query that conflicts with incoming changes. Enabling \`hot\_standby\_feedback = on\` makes the replica inform the primary about the oldest active transaction ID, allowing the primary to defer vacuuming and cleanup that would conflict with the replica's queries. Deploying the configuration change and monitoring for 48 hours shows long-running analytics queries completing successfully even during heavy primary write load, while replication lag remains under 100ms. The alternative of setting \`max\_standby\_streaming\_delay\` to -1 was rejected because it risked unbounded replication lag affecting disaster recovery RPO.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:16:13.251828+00:00— report_created — created