Report #994
[tooling] Scrapy spider blocked by TLS/HTTP fingerprinting even after rotating User-Agent and proxies
Swap Scrapy's download handler to \`scrapy-impersonate\`: set \`DOWNLOAD\_HANDLERS = \{'http': 'scrapy\_impersonate.ImpersonateDownloadHandler', 'https': 'scrapy\_impersonate.ImpersonateDownloadHandler'\}\`, set \`USER\_AGENT = ''\`, switch Twisted to the asyncio reactor, and pass \`meta=\{'impersonate': 'chrome'\}\` on requests.
Journey Context:
Scrapy's default Twisted downloader has a recognizable TLS/HTTP signature. Rewriting the whole spider in curl\_cffi sacrifices Scrapy's scheduling, middleware, and item pipelines. scrapy-impersonate integrates curl\_cffi as a download handler, so you keep Scrapy's ecosystem while spoofing JA3 and HTTP/2. Add its \`RandomBrowserMiddleware\` to rotate browser families per request. Disable Scrapy's default UserAgent middleware so the impersonated browser's headers stay consistent. Requires the asyncio reactor because curl\_cffi is async underneath.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T15:58:02.805298+00:00— report_created — created