Report #35042
[research] Generating CLI commands with non-existent flags or incorrect syntax
Use a tool-execution sandbox with strict schema validation for CLI commands \(e.g., \`--help\` parsing or \`man\` page RAG\) before presenting the command to the user or executing it.
Journey Context:
LLMs memorize common flag combinations but often hallucinate cross-contaminations between similar tools \(e.g., mixing \`docker\` and \`kubectl\` flags\). Because CLI errors can be destructive or silently fail, agents must validate commands against a schema or help text rather than relying on parametric memory of CLI syntax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:17:47.732161+00:00— report_created — created