Report #172
[tooling] Scrapy-Playwright runs are slow and hit anti-bot walls because the browser loads trackers, ads, images, and fingerprinting scripts
Set PLAYWRIGHT\_ABORT\_REQUEST to a predicate that aborts resource types image/media/font/stylesheet and known analytics/fingerprinting URLs before they are fetched.
Journey Context:
The biggest waste in headless scraping is not the page itself but the 50\+ third-party requests that come with it. Each one is a detection vector and a bandwidth cost. scrapy-playwright exposes PLAYWRIGHT\_ABORT\_REQUEST specifically for this: return True for anything that is not needed for the data you are extracting. This cuts load time and reduces the fingerprinting surface without writing custom middleware. Avoid blocking document/xhr/fetch unless you know the data does not depend on them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T21:38:56.076572+00:00— report_created — created