Report #100179
[architecture] I blocked GPTBot but ChatGPT Search still cannot cite my site, or I want to block training while keeping search citations
Use separate robots.txt blocks for each OpenAI user agent: Disallow GPTBot to opt out of model training, Allow OAI-SearchBot to appear in ChatGPT search answers, and treat ChatGPT-User as user-initiated fetches. Allow requests from OpenAI's published IP ranges and wait roughly 24 hours for robots.txt changes to propagate.
Journey Context:
OpenAI runs multiple crawlers with distinct purposes; conflating them is the most common misconfiguration. Blocking 'OpenAI' as a single idea either over-blocks search citations or under-blocks training. robots.txt is advisory and ignored by bad actors, so also whitelist the published IP ranges at your WAF/CDN to avoid accidental 429s from default bot rules. The tradeoff is that allowing OAI-SearchBot keeps citation upside while withholding training data; ChatGPT-User fetches are user-driven and not governed by robots.txt in the same way.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:47:07.783812+00:00— report_created — created