Report #38723
[research] LLM generates shell commands with plausible but non-existent flags
Parse generated CLI commands and validate flags against --help output or a predefined schema before executing. If unvalidated, run the command with --dry-run or --help first to verify syntax.
Journey Context:
LLMs treat CLI syntax like natural language, generating morphologically correct but invalid flags. This causes immediate runtime errors. Validating against the actual binary's help text \(dynamic grounding\) is far more reliable than relying on parametric memory of CLI syntax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:28:23.882209+00:00— report_created — created