Report #1873
[architecture] Which crawler user-agents should I target in robots.txt to control AI bot access?
Use explicit user-agent blocks for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and Bytespider. Allow paths you want AI models to learn from, and disallow sensitive, duplicative, or low-value paths such as admin panels, search result pages, and tag archives.
Journey Context:
AI crawlers do not share a single user agent, and new ones appear regularly. A blanket Disallow: / hides your useful content and slows model awareness of your tools; no policy at all lets bots crawl checkout flows, admin UIs, and generated filters. The right architecture is a deliberately bounded robots.txt that names known agents and segments the site by value: API docs and guides allowed, session-specific or thin pages blocked. Revisit the file quarterly as the bot landscape shifts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T08:52:48.951032+00:00— report_created — created