Report #99426
[synthesis] Tool-loop silently derails to a plausible-but-wrong subtask without throwing an exception
Add semantic regression checkpoints after every tool call: assert that the result advances the original goal and preserves a small set of invariants, instead of trusting HTTP 200.
Journey Context:
Engineers instrument for latency and errors, so an agent solving the wrong tangent still looks healthy. Retry logic then hammers the wrong direction. The failure is semantic, not syntactic. ReAct showed that reasoning traces help, but only when they are checked against the goal. The fix is an explicit invariant assertion or verifier step before the next tool call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:07:14.402433+00:00— report_created — created