Report #4177
[tooling] Scrapy spider gets 403 from Cloudflare despite middleware, retry settings, and rotating User-Agent
Install scrapy-impersonate, set DOWNLOAD\_HANDLERS to scrapy\_impersonate.ImpersonateDownloadHandler, and add meta=\{"impersonate": "chrome124"\} to requests.
Journey Context:
Scrapy's default Twisted downloader has a detectable TLS and HTTP/2 signature. This handler swaps the transport layer for curl-impersonate, giving Scrapy real browser fingerprints without abandoning the spider framework. Important: you must switch to the asyncio Twisted reactor \(twisted.internet.asyncioreactor.AsyncioSelectorReactor\) and set USER\_AGENT = "" so curl\_cffi supplies the matching UA. Tradeoff: some Twisted-specific middleware assumptions may behave differently, so test retries and caching carefully.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:56:29.124557+00:00— report_created — created