Report #778
[architecture] Why is my site invisible to ChatGPT / Claude / Perplexity even though robots.txt allows them?
Audit your CDN/WAF edge layer first. Many AI crawlers are blocked at the edge—by Cloudflare's managed 'Block AI bots' rule or similar WAF policies—before they ever read robots.txt. Layer your access policy: robots.txt for polite directives, plus verified CDN bot-management rules that distinguish training crawlers \(GPTBot, ClaudeBot\) from retrieval/search crawlers \(OAI-SearchBot, ChatGPT-User, Claude-SearchBot, PerplexityBot\).
Journey Context:
Robots.txt is an honor-system protocol read after a connection succeeds; the CDN edge is a gate that can drop the request first. Since mid-2025, Cloudflare defaults new zones to blocking AI bots, and other WAFs silently rate-limit them as suspicious traffic. The architecture mistake is treating robots.txt as the single source of truth. The right model is defense-in-depth: robots.txt documents policy, but the edge enforces it. You also need to distinguish bot classes—block GPTBot for training while allowing OAI-SearchBot if you want ChatGPT Search citations, or block ClaudeBot while allowing Claude-SearchBot. Blanking banning every AI user-agent loses retrieval visibility that robots.txt technically permits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T12:56:18.117037+00:00— report_created — created