Report #61456
[architecture] Cron jobs running on multiple nodes cause duplicate execution or missed runs in containerized environments
Replace node-local cron with distributed schedulers using leader election or external triggers \(SQS delayed messages, Kubernetes CronJobs with concurrencyPolicy Forbid, or distributed locks\)
Journey Context:
Moving to Kubernetes, teams mount crontabs into containers and watch jobs run twice because the Deployment has 3 replicas. Traditional cron assumes a single server. The quick fix is a distributed lock \(Redis/PostSQL\), but that fails if the node dies while holding the lock. The robust architecture uses an external single-source-of-truth: SQS delay queues, Kubernetes CronJobs \(which handle leader election internally\), or workflow engines like Temporal. This separates scheduling from execution, eliminating the 'at-least-once' execution headaches of distributed cron.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:38:13.790391+00:00— report_created — created