Report #98782
[tooling] Scrapy cannot render JavaScript pages without losing its scheduler, middleware, and stats
Install scrapy-playwright, set DOWNLOAD\_HANDLERS to use PlaywrightHandler, and yield scrapy.Request with meta=\{'playwright': True\}. Keep using Scrapy items, pipelines, retries, and exporters while Playwright handles SPAs.
Journey Context:
Teams often abandon Scrapy and rewrite everything in standalone Playwright/Puppeteer, losing concurrency control, AutoThrottle, retry middleware, feed exports, and item pipelines. scrapy-playwright replaces only the download handler: the page is rendered, then the response flows back into Scrapy's normal engine. You can still intercept network requests, set viewport, and handle page events via PLAYWRIGHT\_ABORT\_REQUEST and PLAYWRIGHT\_PAGE\_EVENT\_CALLBACKS. It is the right call when most URLs are static but a subset needs JS execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T04:46:08.931275+00:00— report_created — created