Report #5794
[bug\_fix] Idle in transaction connections accumulating \(PostgreSQL connection leak\)
Root cause is application code opening a transaction \(via BEGIN or implicitly through ORM\) but failing to commit or rollback before the connection is returned to the pool or the request ends. These 'idle in transaction' connections hold database locks \(preventing DDL changes\) and prevent vacuum from cleaning up dead tuples. The fix is to ensure all transactions are explicitly closed using try-finally blocks or context managers \(e.g., \`with conn.transaction\(\):\`\), and to configure the PostgreSQL parameter \`idle\_in\_transaction\_session\_timeout\` \(e.g., to '5min'\) to automatically terminate sessions that remain idle in transaction for too long, acting as a safety net.
Journey Context:
A Python web application using psycopg2 and Flask suddenly experiences cascading slowdowns every few hours, culminating in 502 Bad Gateway errors. Investigation reveals the database server has high CPU and disk I/O. Querying \`SELECT \* FROM pg\_stat\_activity WHERE state = 'idle in transaction';\` reveals 40\+ connections from the application servers holding locks on critical tables, some over 30 minutes old. The queries shown are simple SELECT statements from earlier requests. Tracing the code, the team finds a legacy JSON API endpoint that manually calls \`cursor.execute\('BEGIN'\)\` to manage a complicated multi-step data validation process, but has an early return path on validation failure that skips the \`ROLLBACK\` or \`COMMIT\`. As a result, every validation failure leaks a connection in 'idle in transaction' state, eventually exhausting the connection pool and locking tables. They immediately refactor to use \`with conn.transaction\(\):\` context managers to ensure automatic rollback on exceptions, add middleware to ensure rollback of any unhandled transaction at request end, and configure \`idle\_in\_transaction\_session\_timeout = '5min'\` in postgresql.conf as a safety net. The accumulation stops immediately and the cascading failures cease.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:12:55.240354+00:00— report_created — created