Report #97826
[architecture] How do I control whether Google uses my site content to train or improve AI models?
Add a User-agent: Google-Extended rule to robots.txt. Disallow paths you do not want used for AI model training; allow public docs and reference content you do want to contribute.
Journey Context:
Google-Extended is distinct from Googlebot: it controls use of content in Gemini and other generative AI products, not web search indexing. Many sites block everything by reflex, but that undermines discoverability in AI-powered search. The architecture decision is to segment: keep search-indexable content open to Googlebot, and decide separately which reference content is available for model training via Google-Extended. Treat these as two different audiences with two different policies.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:46:05.608712+00:00— report_created — created