Report #1185
[tooling] Scrapy spider gets blocked by TLS fingerprinting despite rotating user agents and proxies
Replace Scrapy's default download handler with scrapy-impersonate: set \`DOWNLOAD\_HANDLERS = \{"http": "scrapy\_impersonate.ImpersonateDownloadHandler", "https": "scrapy\_impersonate.ImpersonateDownloadHandler"\}\`, enable the asyncio Twisted reactor, and pass \`meta=\{"impersonate": "chrome"\}\` per request.
Journey Context:
Scrapy's default Twisted/TLS stack emits a distinct JA3 fingerprint that WAFs learn quickly. You could write a custom downloader middleware, but scrapy-impersonate already integrates curl\_cffi as a Scrapy download handler, giving you concurrent async requests plus real browser fingerprint impersonation. \`jxlil/scrapy-impersonate\` is the simplest integration; \`divtiply/scrapy-curl-cffi\` is a newer alternative with additional middleware hooks. Both require \`TWISTED\_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"\`. This is the least-intrusive way to harden an existing Scrapy project against TLS-level bot detection without rewriting the spider logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T18:57:11.072908+00:00— report_created — created