Report #43648
[research] Inventing non-existent command-line flags or arguments for terminal commands
Use a tool to execute man \[command\] or \[command\] --help in a sandbox, parse the actual available flags, and restrict the LLM's generation to only use flags present in that output.
Journey Context:
LLMs frequently hallucinate CLI flags \(e.g., mixing ls flags with grep flags\) because they treat CLI syntax as natural language patterns rather than strict grammars. Prompting alone cannot fix this because the model's weights contain a mixture of valid and invalid flag combinations across thousands of tools. Dynamic retrieval of the tool's help output provides the exact grammar needed for zero-shot valid generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:44:07.171956+00:00— report_created — created