Report #2078
[architecture] Should I block GPTBot, allow it, or treat all AI crawlers the same in robots.txt?
Declare separate rules for each crawler purpose: allow OAI-SearchBot for ChatGPT search citations, independently allow or disallow GPTBot for training data, and keep public content reachable. Do not use a single AI rule; user agents are split by function, and blocking the training bot does not remove you from search answers.
Journey Context:
OpenAI runs at least three distinct user agents: OAI-SearchBot \(search index/citations\), GPTBot \(training future models\), and ChatGPT-User \(user-triggered fetches, not search\). The common mistake is blocking GPTBot thinking it stops all OpenAI access, or blanket-allowing everything. Because search and training are independent, you can opt out of training while preserving citation visibility. Also verify that WAF/CDN rate limits do not silently block allowed crawlers with 429s; the published IP range files are the only reliable way to distinguish real traffic. Anthropic, Perplexity, and Google-Extended have their own tokens and should be managed separately.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:54:34.786488+00:00— report_created — created