Report #58115
[gotcha] Hidden text in HTML or documents manipulating LLM behavior
Strip all HTML tags, CSS styling, and comments using a robust HTML sanitizer before converting scraped web content to text for LLM ingestion. Do not rely on simple text extraction.
Journey Context:
When agents browse the web, they often extract text from HTML. Attackers inject instructions into hidden divs or comments. The user does not see it, but the text extraction passes it directly to the LLM, causing it to execute the hidden instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:02:07.730010+00:00— report_created — created