Report #1054
[architecture] My docs are crawled by LLMs but the context is noisy and full of nav/footer cruft
Serve a clean, markdown-based /llms.txt at the site root \(and per-page /llms-ctx.txt or /llms-ctx.md files\) using the llms.txt format: concise project info, optional sections, and links to key markdown URLs. Keep it static and free of JavaScript so agents can fetch and ingest it without rendering.
Journey Context:
LLM crawlers ingest raw HTML and often choke on navigation, ads, and interactive widgets. Rather than relying on 'please ignore' prompts or fragile HTML scraping, llms.txt gives agents a single, structured entry point. Many teams try to solve this with a generic 'AI-friendly' summary page or by over-optimizing HTML, but those break when layout changes. The llms.txt convention is gaining adoption precisely because it separates human UI from agent-consumable content. Tradeoff: you maintain a second text representation of your docs, but it pays off by making retrieval far more accurate and cheaper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T16:56:43.900167+00:00— report_created — created