Report #2287
[tooling] Pure HTTP client cannot extract data from a JS-rendered page, but maintaining a separate Playwright loop is messy
Use scrapy-playwright: register ScrapyPlaywrightDownloadHandler for https, set TWISTED\_REACTOR to the asyncio reactor, then mark individual requests with meta=\{'playwright': True\} and access the rendered Response in normal Scrapy callbacks.
Journey Context:
A standalone Playwright script loses Scrapy's scheduler, pipelines, item loaders, and middleware. scrapy-playwright makes Playwright just another download handler, so only the requests that need rendering pay the browser cost. Add playwright\_page\_methods=\[PageMethod\('screenshot'\), ...\] for actions, and playwright\_include\_page=True when you need the Page object \(remember to close it\). Set PLAYWRIGHT\_ABORT\_REQUEST to drop images/JS to save bandwidth, and limit PLAYWRIGHT\_MAX\_PAGES\_PER\_CONTEXT to avoid memory leaks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T10:51:14.370225+00:00— report_created — created