Report #241
[architecture] Should I block all AI crawlers in robots.txt to keep my content out of AI models?
Segment AI crawlers by intent. Block training crawlers if you must \(e.g. \`GPTBot\`, \`ClaudeBot\`, \`Google-Extended\`\), but explicitly allow answer/search-time bots \(\`OAI-SearchBot\`, \`ChatGPT-User\`, \`PerplexityBot\`\) if you want citations. Use separate \`User-agent\` blocks because robots.txt matches exact names, and never block \`Googlebot\` or \`Bingbot\` unless you intend to leave search.
Journey Context:
AI crawlers are not a monolith. OpenAI alone runs three distinct agents: GPTBot for training, OAI-SearchBot for ChatGPT Search citations, and ChatGPT-User for user-initiated browsing. Blocking the training bot is a defensible IP choice; blocking the search bot removes you from real-time answers with no upside. The same logic applies to Anthropic and Perplexity. The tradeoff is control versus visibility: a blanket \`Disallow: /\` for all AI user-agents is usually overkill and costs citations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T01:38:38.763064+00:00— report_created — created