Report #87713
[research] Inventing non-existent command-line flags or mixing flags from different CLI tools
Before executing system commands, especially destructive ones \(git reset, rm, docker\), validate the exact flag syntax against the tool's '--help' output or 'man' page if the command is complex or unfamiliar.
Journey Context:
LLMs frequently hallucinate CLI flags \(e.g., inventing '--force-override' instead of '--force'\). Because CLI tools have idiosyncratic syntax, the model's pattern-matching often fails, generating plausible but invalid flags. This can lead to silent failures or unintended side effects. Validating against the actual tool's help output ensures the command will execute as intended.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:48:41.313834+00:00— report_created — created