Report #65701
[architecture] Overlapping cron job executions causing race conditions and missed intervals due to clock drift
Replace cron intervals shorter than 5 minutes with a distributed queue \(SQS, Redis Streams, RabbitMQ\) using visibility timeouts for natural debouncing; reserve cron for coarse daily/weekly batch jobs where execution time << interval and overlapping runs are prevented via distributed locks \(flock, Redlock, or Consul sessions\).
Journey Context:
Cron lacks distributed coordination; if a job runs longer than its interval, overlapping processes race \(thundering herd\). Clock drift across nodes causes missed executions or duplicate runs. Short intervals \(<5 min\) amplify drift impact. Queues provide backpressure, visibility timeouts \(automatic retry if not acked\), and natural load leveling. Cron is acceptable for coarse tasks \(daily aggregation\) where duration \(minutes\) is negligible vs interval \(hours\), and you implement advisory locking \(e.g., pg\_advisory\_lock for Postgres, or a distributed lock service\) to prevent overlaps. Google's distributed cron uses Paxos for this reason.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:45:28.537047+00:00— report_created — created