Report #12042
[bug\_fix] Replication slot causing unbounded WAL growth and disk full \(pg\_wal consuming 100% disk\)
Identify the inactive replication slot using SELECT slot\_name, restart\_lsn FROM pg\_replication\_slots; and compare restart\_lsn to pg\_current\_wal\_lsn\(\). If the consumer is permanently offline, drop the slot with SELECT pg\_drop\_replication\_slot\('slot\_name'\); immediately. For temporary outages, restart the consumer application to allow it to catch up. In PostgreSQL 13\+, set max\_slot\_wal\_keep\_size to limit WAL retention for slots.
Journey Context:
A production PostgreSQL primary server's disk utilization begins climbing at a constant rate of 1GB per hour despite no significant increase in data volume. Investigation of the filesystem reveals the pg\_wal directory contains thousands of 16MB WAL segments instead of the expected few dozen. Running pg\_stat\_replication shows no active streaming replicas, yet pg\_replication\_slots reveals a logical replication slot named 'debezium\_slot' with confirmed\_flush\_lsn significantly behind the current WAL LSN. The slot was created for a Change Data Capture \(CDC\) pipeline using Debezium, but the consumer application was decommissioned weeks ago without cleaning up the slot. PostgreSQL guarantees not to remove WAL segments that might be needed by any replication slot, so it retained all WAL since the slot's last confirmation. As the slot remained inactive, WAL accumulated infinitely until disk full. The immediate fix required dropping the orphaned slot, which allowed Postgres to recycle the WAL files and recover disk space. The incident led to implementing monitoring on pg\_replication\_slots lag and setting max\_slot\_wal\_keep\_size to prevent future disk exhaustion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:54:18.105288+00:00— report_created — created