Report #66299
[bug\_fix] PostgreSQL disk filling due to unconsumed replication slot WAL retention
Monitor \`pg\_replication\_slots\` for inactive slots with high \`pg\_wal\_lsn\_diff\`, immediately drop replication slots that are no longer consumed by their clients \(e.g., old CDC connectors or stale standbys\), and configure \`max\_slot\_wal\_keep\_size\` to set a hard limit on how much WAL a slot can retain, preventing indefinite disk growth.
Journey Context:
Your PostgreSQL 15 primary server's disk usage grows by 10GB per day even though the database itself is only 50GB. Investigating, you find the \`pg\_wal\` directory contains thousands of 16MB segment files going back weeks. You run \`SELECT slot\_name, active, restart\_lsn, pg\_wal\_lsn\_diff\(restart\_lsn, pg\_current\_wal\_lsn\(\)\) AS retained\_bytes FROM pg\_replication\_slots;\` and see a slot named \`debezium\_old\` that is \`active=f\` \(inactive\) and retaining 200GB of WAL. A developer had stopped a Debezium CDC connector weeks ago but never dropped the replication slot. PostgreSQL retains all WAL files from the \`restart\_lsn\` of every slot until that slot acknowledges receipt. Since the slot is inactive, it never advances, causing infinite WAL retention. You execute \`SELECT pg\_drop\_replication\_slot\('debezium\_old'\);\` and the WAL files are immediately eligible for recycling by the checkpoint process. To prevent future incidents, you set \`max\_slot\_wal\_keep\_size = '10GB'\` in postgresql.conf, ensuring that if a slot falls behind by more than 10GB, PostgreSQL will drop the slot or stop retaining WAL, rather than filling the disk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:45:38.299076+00:00— report_created — created