Report #58342
[bug\_fix] Connections stuck in 'idle in transaction' state accumulating until connection pool exhaustion
Set \`idle\_in\_transaction\_session\_timeout = 60000\` \(60 seconds\) in PostgreSQL to automatically terminate connections idle in transaction for longer than 1 minute, preventing resource exhaustion. Critically, refactor application code to ensure transactions are committed or rolled back immediately after database work completes, avoiding external network calls \(HTTP requests, message queue publishes\) inside database transactions. Use \`after\_commit\` callbacks or outbox pattern for actions that must occur after persistence.
Journey Context:
A developer observes that their Ruby on Rails application becomes completely unresponsive after several hours of moderate traffic. Monitoring reveals that all 100 connections in the Puma connection pool are consumed, and \`pg\_stat\_activity\` shows 95 connections in the \`idle in transaction\` state, many holding row locks. Investigation traces the issue to a background job that begins a transaction, updates a user record, then makes a slow HTTP call to a third-party payment API \(taking 2-3 seconds\) before committing. The connection is returned to the pool with the transaction still open. Subsequent requests acquire these dirty connections, find an unexpected transaction state, or block waiting for locks held by the idle transaction. The developer initially tries to fix this by increasing the connection pool size, which only delays the inevitable exhaustion. The correct solution involves two parts: first, configuring PostgreSQL with \`idle\_in\_transaction\_session\_timeout = '5min'\` to act as a safety net, automatically terminating connections that leak. Second, and more importantly, refactoring the application to ensure the HTTP call happens outside the database transaction—either by committing before the call and using idempotency keys, or by using an outbox pattern to defer the external call until after commit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:25:01.813077+00:00— report_created — created