Report #442
[tooling] Scrapy spider burns through proxies or marks valid pages as banned because the ban detector is too coarse
Install scrapy-rotating-proxies, add RotatingProxyMiddleware and BanDetectionMiddleware, set ROTATING\_PROXY\_LIST or ROTATING\_PROXY\_LIST\_PATH, and write a custom ROTATING\_PROXY\_BAN\_POLICY that subclasses BanDetectionPolicy. Return False for non-ban status codes like 404 and add site-specific signals \(e.g., b'captcha' in body\) so the rotator only blames proxies for actual blocks.
Journey Context:
The default heuristic treats any non-200 or empty body as a dead proxy, so 404s and malformed pages wrongly remove good proxies. A custom policy separates URL problems from proxy problems. Pair it with conservative CONCURRENT\_REQUESTS\_PER\_DOMAIN and DOWNLOAD\_DELAY so residential proxies last longer; otherwise rotation cost dominates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T07:56:43.510814+00:00— report_created — created