Report #10082
[architecture] Handling traffic spikes that overwhelm downstream services with strict rate limits
Insert a message queue \(SQS, RabbitMQ, etc.\) between services when downstream has strict rate limits or is expensive to scale. Monitor 'ApproximateAgeOfOldestMessage' \(age of oldest message\) to scale consumers, not just queue depth. Set a maximum message age \(TTL\) to prevent processing stale requests, and use dead-letter queues after 3 processing failures.
Journey Context:
Direct HTTP calls couple availability—if downstream is slow, upstream accumulates memory/threads and cascades failures. Queues decouple in time \(asynchronous\) and rate \(buffering\). However, queues add latency \(unsuitable for real-time\), complexity \(poison messages, dead letter handling\), and require idempotent consumers. The key insight: monitoring queue depth \(number of messages\) is misleading—a static queue of 1000 is fine if processing at 1000/sec, but catastrophic if processing at 1/hour. Instead, monitor the age of the oldest message. Also, visibility timeout must be tuned carefully—too short causes duplicate processing, too long blocks retry of poison messages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T09:47:11.673222+00:00— report_created — created