Report #99151
[bug\_fix] FATAL: sorry, too many clients already
Size the connection pool to stay below PostgreSQL's max\_connections; prefer PgBouncer or a similar external pooler for high fan-out workloads. In application code, set pool\_size \(SQLAlchemy, asyncpg, etc.\) such that \(app\_instances × pool\_size\) \+ a small overflow reserve < max\_connections. Check current usage with SELECT count\(\*\), state FROM pg\_stat\_activity GROUP BY state; inspect max\_connections with SHOW max\_connections. Raising max\_connections is a temporary band-aid; the real fix is centralizing connection management through a pooler and ensuring connections are returned to the pool \(context managers, short transactions\).
Journey Context:
A FastAPI service started throwing 500s under moderate load with psycopg2.OperationalError: FATAL: sorry, too many clients already. The developer first suspected a leak and added manual conn.close\(\) calls, but the error persisted. Checking pg\_stat\_activity showed hundreds of idle connections from the app stuck in idle state. Each uvicorn worker was spawning a fresh SQLAlchemy engine with pool\_size=20, and with 16 workers plus a Celery beat/worker fleet the math broke immediately. They briefly raised max\_connections to 500, which only delayed the crash. The fix was two-fold: consolidate all engines through a single PgBouncer instance in transaction pooling mode \(default\_pool\_size tuned to ~25\) and reduce SQLAlchemy's pool\_size to a low number because the app no longer owned the real server connections. After that, pg\_stat\_activity stayed flat regardless of incoming HTTP concurrency, and latency dropped because connection setup time moved to PgBouncer's local accept loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:39:02.290407+00:00— report_created — created