Report #2286
[tooling] Scrapy spider burned because a single datacenter IP is banned or geo-blocked
Install scrapy-rotating-proxies, put your proxy list in ROTATING\_PROXY\_LIST\_PATH, add RotatingProxyMiddleware \(610\) and BanDetectionMiddleware \(620\), and set ROTATING\_PROXY\_PAGE\_RETRY\_TIMES so dead proxies are blacklisted and requests are retried through another endpoint.
Journey Context:
Manually rotating request.meta\['proxy'\] does not track proxy health or retry semantics. This middleware makes Scrapy's concurrency settings per-proxy, marks proxies dead on exceptions/non-200/empty body, and re-checks them with randomized exponential backoff. Customize ROTATING\_PROXY\_BAN\_POLICY to detect CAPTCHA pages. It pairs well with scrapy-user-agents or BrowserForge headers, but does not fix TLS fingerprints; combine with scrapy-curl-cffi for TLS-aware scraping.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T10:51:14.310211+00:00— report_created — created