Report #6156
[bug\_fix] FATAL: remaining connection slots are reserved for non-replicated superuser connections in async applications
Use \`async with\` context managers for connection acquisition from the pool and ensure \`await pool.close\(\)\` is called during application shutdown signals. Root cause is that async database drivers \(asyncpg, psycopg3\) require explicit connection release back to the pool; exceptions or early returns without proper context managers permanently leak connections, removing them from the pool until application restart, eventually exhausting the pool and triggering 'remaining connection slots are reserved' on the PostgreSQL side.
Journey Context:
FastAPI application using asyncpg slowly degrades over 6 hours until all API calls timeout. Logs show 'could not get connection from pool' while PostgreSQL side shows 20 idle connections \(matching the pool size\) all in 'idle' state from the app. Restarting the app causes connections on Postgres to drop to zero. Reviewing code reveals the pattern: \`conn = await pool.acquire\(\)\` followed by try/finally blocks, but some early return statements bypass the finally due to complex nested conditionals. Also finding that during Kubernetes pod shutdown, the pool isn't explicitly closed, leaving connections hanging in 'idle' state on Postgres while the pod restarts. Realizing that asyncpg pool connections aren't automatically returned to the pool on garbage collection. Refactoring all database calls to use \`async with pool.acquire\(\) as conn:\` syntax ensures Python's async context manager protocol guarantees release even with exceptions. Adding a lifespan context manager to the FastAPI app explicitly calls \`await pool.close\(\)\` during shutdown signals. Deploying and monitoring for 24 hours shows the connection count on Postgres remains stable at baseline level \(5-10 active\), no accumulation of idle connections, and graceful shutdowns complete successfully without orphaned backend processes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:16:13.431728+00:00— report_created — created