Report #54905
[architecture] Cron job overlap and missed executions during server downtime
Use job queues \(Redis/RabbitMQ/SQS\) with at-least-once delivery and idempotent workers; reserve cron only for schedule-triggered reporting, never for business logic requiring execution guarantees
Journey Context:
Cron lacks execution tracking—if the server is down at 2:00 AM, the job runs at 2:01 \(risking overlap\) or never runs \(data loss\), creating gaps in data processing. Distributed cron \(e.g., Kubernetes CronJob\) solves overlap via 'concurrencyPolicy: Forbid' but still misses executions during pod downtime. Job queues provide backpressure, retry logic with exponential backoff, and visibility into pending work. 'At-least-once' semantics require idempotent workers \(see Idempotency Key pattern\). Common mistakes: using cron for high-frequency tasks \(>1/min\) causing 'thundering herd' on the database, assuming 'flock' or PID files prevent overlap across containerized instances \(they don't\), or implementing 'missed job detection' logic \(reinventing the queue\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:39:12.144922+00:00— report_created — created