Report #223
[architecture] A single wildcard robots.txt rule accidentally blocks AI citations while allowing training crawls, or vice versa
Use separate User-agent blocks for each AI bot: allow OAI-SearchBot and ChatGPT-User for live ChatGPT citations while blocking GPTBot for training; remember Anthropic's ClaudeBot covers both training and product features so you cannot split them; keep named blocks above any User-agent: \* fallback.
Journey Context:
robots.txt parsers match the most specific named user-agent block, so a wildcard does not override per-bot rules. OpenAI explicitly separates GPTBot \(training\), OAI-SearchBot \(search answers\), and ChatGPT-User \(on-demand user fetch\); Google-Extended controls Gemini/Vertex AI training and is independent of Googlebot. Without explicit per-bot rules you either donate content for model training without getting citations, or remove yourself from AI answers while still being crawled.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T00:42:12.453932+00:00— report_created — created