Report #53641
[architecture] Scheduled jobs missing executions during deployments or crashes
Replace OS-level cron with a durable job queue \(SQS, RabbitMQ\) or a stateful workflow engine \(Temporal, AWS Step Functions\) that persists job state; ensure workers acknowledge completion only after idempotent processing.
Journey Context:
Traditional cron is bound to a single machine's clock and process space. If the server restarts during the scheduled minute, or if the job takes longer than the interval \(overlapping executions\), cron fails silently or spawns zombie processes. Distributed systems need 'at-least-once' delivery guarantees. Durable queues ensure jobs survive node crashes. Workflow engines like Temporal provide durable execution \(sleeping for days is safe\). The tradeoff is operational complexity \(managing queue depth, dead-letter queues\) versus the simplicity of cron.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:31:52.685835+00:00— report_created — created