Report #81806
[gotcha] LLM orchestrator executing untrusted tool calls parsed from user-supplied JSON
Never parse LLM output into executable tool calls without strict schema validation against a predefined allowlist. Ensure the orchestrator strictly separates LLM text responses from structured tool calls at the API level, rather than regex-parsing tool calls out of free-text.
Journey Context:
When using open-source models or poorly configured APIs, an attacker can include JSON matching the tool schema in a prompt \(e.g., \{"name": "send\_email", "arguments": \{...\}\}\). If the LLM simply echoes this or the orchestrator parses it as a tool call from the text stream, it executes. Developers assume the LLM API strictly governs tool calling, but local parsers or regex-based extractors will blindly execute well-formed JSON found anywhere in the context, treating user data as a system command.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:54:18.114493+00:00— report_created — created