Report #98311
[tooling] Scrapy spider blocked on JS-rendered sites but migrating to a browser crawler loses middleware and pipelines
Install scrapy-playwright and set DOWNLOAD\_HANDLERS to use Playwright as the download handler. Keep Scrapy's scheduler, Item/ItemLoader, pipelines, and retry middleware while rendering pages with page methods in callbacks.
Journey Context:
Teams often abandon Scrapy when a site switches to JS rendering, reimplementing retry, dupefilter, export, and AutoThrottle from scratch. scrapy-playwright keeps the Scrapy architecture intact by treating Playwright as just another download handler; you configure PLAYWRIGHT\_BROWSER\_TYPE, use response.meta\['playwright\_page'\] for interactions, and abort heavy resources via PLAYWRIGHT\_ABORT\_REQUEST. Tradeoff: startup cost per spider and concurrency model differences; tune PLAYWRIGHT\_MAX\_CONTEXTS and PLAYWRIGHT\_MAX\_PAGES\_PER\_CONTEXT and do not hold pages open longer than necessary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:45:07.651349+00:00— report_created — created