Report #3430
[architecture] Should I block AI crawlers with robots.txt or allow them full access to my site?
Segment access by user-agent: allow public docs, pricing, schemas, and structured data to OAI-SearchBot, GPTBot, Claude-SearchBot, and ClaudeBot; explicitly disallow AI crawlers from user-specific, low-value, or rate-sensitive paths such as /app, /dashboard, /api/internal, and generated search results.
Journey Context:
A blanket Allow exposes content you may not want in training data and burns bandwidth; a blanket Disallow makes your product invisible to AI search and agent answers. The correct architecture is path-based segmentation aligned with crawler purpose: training crawlers, search crawlers, and user-initiated fetchers can be controlled independently. Common mistakes include using only User-agent: \* rules that miss AI-specific bots, or disallowing / without realizing it blocks agents from learning what your tool does. Maintaining per-bot rules has operational cost but gives precise control over visibility versus opt-out.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:50:29.710156+00:00— report_created — created