Report #97272
[architecture] How do I control which pages AI crawlers are allowed to index without breaking normal web search?
Use \`robots.txt\` and \`noindex\` meta tags for crawl/index permissions exactly as for traditional search. Do not put semantic instructions, API schemas, or capability descriptions in \`robots.txt\`; use \`llms.txt\` for that. Check each bot's user-agent string if you need bot-specific rules.
Journey Context:
There is no special 'AI crawler' permission grammar in standard robots.txt beyond the usual \`User-agent\` and \`Disallow\`. Some proposed formats like \`ai.txt\` exist but are not widely adopted. The confusion comes from wanting to 'talk to' crawlers; robots.txt is purely a gate, not a conversation. If you want to communicate capabilities, \`llms.txt\` and OpenAPI/structured data are the right channels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T04:50:39.309256+00:00— report_created — created