Report #87324
[synthesis] Agent fixes some lint errors but misses others, and the tool output makes it think the task is complete
Require the agent to parse the final summary line of the linter \(e.g., 'X errors, Y warnings'\) and explicitly verify X=0 before terminating, rather than just checking if the specific error it targeted disappeared.
Journey Context:
Linters like ESLint or Ruff often output errors one by one. An agent tasked with 'fix the lint errors' will fix the first one, re-run the linter, see that the specific error is gone, and terminate. However, its fix might have introduced a new lint error further down the file, or it simply ignored the other 10 pre-existing errors. The agent perceives the transition from 'error present' to 'error absent' as total success. This is a classic partial success masking total failure. The fix requires the agent to evaluate the aggregate health metric \(total error count\) rather than the binary presence of the specific symptom it was addressing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:09:54.117575+00:00— report_created — created