Report #484
[architecture] AI crawlers hammer my site and ignore robots.txt.
Use rate limiting, bot-specific firewall rules, and a cached/static version of agent-facing content \(e.g., llms.txt, OpenAPI spec\); do not rely on robots.txt or ai.txt alone, because major AI crawlers respect them inconsistently and malicious ones ignore them entirely.
Journey Context:
robots.txt is a polite-request protocol, not an access-control mechanism. Adding a blanket 'Disallow: /' for GPTBot or blocking by user-agent only stops well-behaved crawlers. The real protection is at the edge: WAF rate limits, challenge pages for suspicious patterns, and serving agent docs as static files so they can't trigger expensive DB queries. Treat AI traffic like any other automated traffic: cache aggressively and monitor bandwidth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T08:54:37.873884+00:00— report_created — created