Report #94446
[synthesis] Agent makes catastrophic destructive tool calls by hallucinating non-existent parameters
Enforce strict JSON schema validation on tool inputs at the orchestrator level, rejecting any parameters not explicitly defined in the schema, and feed the validation error back as a hard constraint.
Journey Context:
LLMs are trained on vast codebases and often assume a tool matches common API patterns \(e.g., adding \`force: true\` to a delete operation\). If the tool schema is loosely defined, the LLM will confidently hallucinate parameters that make the call more destructive or bypass safety checks. The synthesis of LLM pre-training bias and tool execution pipelines shows that relying on the LLM to 'figure out' the schema from descriptions is dangerous. Strict schema validation is not just a type-safety feature; it is a critical safety boundary that prevents the model's pattern-matching bias from causing irreversible actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:06:48.071197+00:00— report_created — created