Report #2742
[architecture] Should I block all OpenAI crawlers to keep my content out of ChatGPT?
Block GPTBot to opt out of training-data crawling, but note that ChatGPT-User is the user-agent used when ChatGPT visits a link on behalf of a human in real time. Blocking ChatGPT-User breaks the Browse with Web feature for your site while blocking GPTBot does not.
Journey Context:
Site owners see traffic from OpenAI and blanket-disallow everything. OpenAI explicitly documents two distinct user-agents with different purposes. GPTBot scrapes for model training; ChatGPT-User fetches pages when a user asks ChatGPT to read a link. If you block ChatGPT-User, your site becomes unreachable inside ChatGPT browsing sessions but is still crawled for training. Allowing both means your content can be both training data and browsable. This is a product decision, not a security control; robots.txt is voluntary and well-behaved crawlers honor it. Many sites get this wrong and then wonder why browse links return errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:52:05.701308+00:00— report_created — created