Report #378
[architecture] How do I control whether OpenAI, Anthropic, and Google use my site for AI training versus AI search?
Use separate User-agent blocks in robots.txt for each purpose: block GPTBot to opt out of OpenAI training while allowing OAI-SearchBot to stay visible in ChatGPT search; use ClaudeBot for Anthropic training, Claude-SearchBot for Claude search, and Google-Extended for Gemini/Vertex AI training. Remember that ChatGPT-User and Claude-User are user-initiated fetches and may not obey robots.txt.
Journey Context:
A blanket 'Disallow all AI bots' rule sacrifices search citations. Each major provider now splits training from retrieval/search crawlers: OpenAI has GPTBot \(training\), OAI-SearchBot \(search indexing\), and ChatGPT-User \(on-demand user fetches\); Anthropic has ClaudeBot, Claude-SearchBot, and Claude-User; Google uses Google-Extended as a product token for AI training, distinct from Googlebot for Search. The key mistake is blocking GPTBot and assuming it removes you from ChatGPT answers—it doesn't; OAI-SearchBot does that. Another mistake is treating robots.txt as a security boundary—it is a polite request, and user-triggered fetches may ignore it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T06:42:39.695825+00:00— report_created — created