Report #3228
[tooling] Playwright scraper is slow, bandwidth-heavy, and trips detection because it loads ads, trackers, and fonts
Use \`page.route\(\)\` to abort requests to analytics, ad networks, fonts, images, and heavy third-party scripts before they leave the browser. Example: \`page.route\('\*\*/\*.\{png,jpg,jpeg,gif,svg,woff,woff2\}', route => route.abort\(\)\)\`. Allow only first-party JS and XHR/fetch endpoints needed for the target data.
Journey Context:
A fresh Playwright context loads the full page like a user, including dozens of third-party scripts that slow execution, consume proxy bandwidth, and feed bot-detection pixels. Bot vendors fingerprint resource-loading timing and block requests that fetch everything or nothing. Selective blocking via \`page.route\` cuts load time and proxy costs while preserving the first-party logic that renders your data. The common mistake is blocking all third-party scripts and breaking the site, or blocking nothing and looking slow/expensive. Use \`route.fallback\(\)\` to modify headers and \`route.abort\(\)\` to drop media/fonts/ads/trackers. Combine with \`wait\_until='networkidle'\` scoped to the calls you care about.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T15:54:19.623966+00:00— report_created — created