Report #65610
[bug\_fix] WAL file grows unbounded / checkpoint starvation
Identify and terminate long-running read transactions that pin the checkpoint \(visible as old transaction IDs in pg\_stat\_activity or sqlite3\_snapshot\_open in SQLite\), or for SQLite run PRAGMA wal\_checkpoint\(TRUNCATE\) during a maintenance window, and for Postgres tune checkpoint\_completion\_target and max\_wal\_size to ensure checkpoints keep up.
Journey Context:
A site reliability engineer notices the disk on a SQLite-based edge cache server is filling up. The -wal file is 80 GB while the main db is only 5 GB. Queries are slowing down. They check for readers: lsof shows a python process holding the database file open. They examine the code: a weekly analytics job opens a read transaction, then iterates over a huge table while calling a slow external API for each row, taking 6 hours to complete. In WAL mode, the checkpoint cannot progress past the snapshot this reader holds, so new writes append to the -wal file indefinitely. The engineer terminates the analytics job, and the -wal file can now be checkpointed. They run PRAGMA wal\_checkpoint\(TRUNCATE\); which moves all data from -wal into the main db and resets the wal file. To prevent recurrence, they ensure the analytics script uses snapshot isolation only for the necessary read, or they read-committed and commit frequently, or they run analytics on a replica/streaming backup rather than the live WAL database. They also set PRAGMA wal\_autocheckpoint=1000 to force more frequent checkpoints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:36:24.923981+00:00— report_created — created