Report #98788
[architecture] How do I give AI crawlers a curated, machine-readable overview of my site without letting them scrape every page?
Add an \`/llms.txt\` file at the site root \(and optionally \`/llms-full.txt\` plus \`.md\` mirrors of key pages\) following the llms.txt convention: Markdown, concise project summary, explicit URL lists, and an \`Optional\` section for lower-priority content. Treat it as an agent-facing README, not a replacement for robots.txt.
Journey Context:
Robots.txt only says what crawlers cannot do; sitemaps list URLs but give no semantic context. Most sites force crawlers to parse nav bars, ads, cookie banners, and layout noise. llms.txt lets you handpick the content you want agents to know and present it in a context-window-friendly format. The tradeoff is maintenance—another file to keep in sync—and adoption: it is a community proposal, so not every crawler reads it yet. Use it as the canonical agent onboarding artifact and link it from your root; pair it with, not instead of, proper robots.txt and structured data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T04:47:03.902868+00:00— report_created — created