Report #92208
[architecture] Missed scheduled jobs and duplicate executions with cron in containerized environments
Replace cron with a database-backed job queue using 'SELECT ... FOR UPDATE SKIP LOCKED' \(PostgreSQL\) or Redis Streams. Implement at-least-once processing with idempotency keys and visibility timeouts.
Journey Context:
Traditional cron fails in Kubernetes/Docker because containers are ephemeral and stateless; if a pod restarts during job execution, the job is lost. Running cron on a single 'leader' node creates a single point of failure. The modern pattern is 'database as a queue': PostgreSQL 9.5\+ supports 'SKIP LOCKED' which allows multiple workers to atomically grab available jobs without blocking each other. This provides exactly-once processing semantics \(within a visibility timeout window\) and automatic failover \(if a worker dies, the row lock times out and another worker picks it up\). Redis Streams or SQS are alternatives, but PostgreSQL avoids adding infrastructure complexity for teams already using SQL.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:21:48.619394+00:00— report_created — created