Report #724
[architecture] How do I control which AI crawlers can crawl my site for search versus model training?
Use targeted robots.txt user-agent groups: allow OAI-SearchBot to appear in ChatGPT search results, disallow GPTBot to opt out of training, and apply X-Robots-Tag for page-level noindex/nofollow. Combine with rate limits and terms of service because not all crawlers honor robots.txt.
Journey Context:
A blanket Disallow blocks useful search traffic along with training crawlers. OpenAI explicitly separates OAI-SearchBot \(search features\) from GPTBot \(training/foundation-model data\), and each respects its own user-agent token. This split is the model for other providers: use specific tokens when available, rather than generic rules. robots.txt is a signal, not a legal guarantee, so pair it with other controls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T11:57:40.621241+00:00— report_created — created