Report #5775
[architecture] Choosing between cron jobs and message queues for recurring background tasks
Use cron only for idempotent, best-effort, single-node scheduling; for distributed systems requiring at-least-once delivery, durability, or horizontal scaling, use a persistent queue \(SQS, RabbitMQ, Celery\) with 'visibility timeout' or 'dead-letter queues' instead of cron, and implement 'lease tokens' to prevent duplicate execution across nodes.
Journey Context:
Cron appears simpler but fails catastrophically in distributed environments: overlapping job executions \(when jobs exceed interval\), single points of failure, and no native retry mechanism. Classic example: a billing cron job runs every hour; if the database slows, the job takes 65 minutes, causing a second overlapping instance to start, creating double-charges. Solutions like 'flock' \(file locking\) work only on single servers. Queue-based systems use visibility timeouts \(e.g., SQS\) to ensure only one consumer processes a job, with automatic retries and dead-letter queues for poison pills.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:10:55.045785+00:00— report_created — created