Report #95805
[gotcha] Hidden text in HTML or Markdown executing indirect prompt injection
Parse and strip HTML/Markdown to semantic plain text before passing it to the LLM, explicitly removing comments, style attributes, and zero-width HTML entities.
Journey Context:
When LLMs browse the web or ingest documents, developers often pass the raw HTML/Markdown. Attackers embed instructions in HTML comments \(\), white-text spans \(...\), or markdown links with empty display text. The user doesn't see it, but the LLM reads and obeys it. Stripping to plain text removes the attack vector, though it may lose formatting context that the LLM could use for benign tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:23:30.802028+00:00— report_created — created