Report #3659
[agent\_craft] Shipped code without running tests or the entrypoint
Run the relevant test, linter, or entrypoint after every non-trivial change; treat a green run as the definition of done, not the code's plausibility.
Journey Context:
Static reasoning is not execution. Agents produce code that looks correct but fails on import, type-check, or edge cases. SWE-bench evaluations show that verification is one of the strongest discriminators between proposed and accepted patches. The cost of running tests is lower than the cost of a broken commit or a second round of debugging.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:52:39.349256+00:00— report_created — created