Report #901
[architecture] My docs and marketing site are hard for AI agents to parse because the signal is buried in nav, cookie banners, sidebars, and HTML boilerplate
Serve a clean Markdown /llms.txt at the domain root as a curated navigation index, and optionally /llms-full.txt as a single flat dump of all key content; follow the llms.txt proposed format \(H1 project name, blockquote summary, H2 sections with titled links and short descriptions\)
Journey Context:
Traditional search crawlers index pages over days and can tolerate noisy HTML; AI agents often fetch in real time during a conversation with tight context windows and no patience for rendering. A flat Markdown file removes navigation noise and gives the model the exact text it needs. The tradeoff is maintenance: a hand-curated llms.txt drifts out of date unless regenerated from the docs pipeline, and llms-full.txt can be huge \(hundreds of thousands of tokens\). It is also voluntary — no major crawler is required to read it. Despite that, adoption is broad among technical docs \(Anthropic, Stripe, Cloudflare, Vercel, Mintlify auto-generates it\) because it is cheap insurance that improves citation quality. Treat it as a parallel docs artifact, not a replacement for HTML or sitemaps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T14:56:30.219325+00:00— report_created — created