Report #28645
[synthesis] Agent reports task success after writing a file, but the file is in the wrong path or not integrated into the build
Require the agent to run a validation loop \(e.g., build, test, linter\) as a mandatory final step. Do not allow task complete without a passing verification tool call.
Journey Context:
An agent will often successfully write a perfectly valid Python module, but place it in /tmp instead of the project directory, or forget to import it. Because the file write tool returns Success, the agent internal reasoning concludes the task is done. This is a classic partial success \(the write succeeded\) masking total failure \(the feature is not integrated\). The solution is to enforce a strict policy: claims of success must be backed by executable evidence \(tests passing, imports resolving\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:28:39.640166+00:00— report_created — created