Report #4207
[agent\_craft] Patch looked correct but broke existing tests or introduced a syntax error
Run the relevant tests after every non-trivial change; do not report completion until failing tests pass and previously passing tests still pass.
Journey Context:
A syntactically valid patch is not a correct patch. The SWE-bench evaluation protocol counts an issue resolved only when both fail-to-pass and pass-to-pass tests succeed. Many agent failures are patches that fix the reported symptom while regressing unrelated behavior. Local test execution is the cheapest way to surface this before the agent declares victory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:59:29.858602+00:00— report_created — created