Report #83847
[synthesis] Agent confidently wrong for multiple steps due to syntactic correctness masking semantic failure
Inject compiler/linter output into the agent scratchpad, but prepend it with a semantic validation step: ask a separate, cheaper model or a rule-based checker if the API or function actually exists in the target environment before executing.
Journey Context:
Naive implementations just pass stderr back to the agent. The agent, seeing a NameError, assumes it just needs to import or define the missing piece, spiraling into a rabbit hole of defining non-existent libraries. Agents generate perfectly formatted code that is semantically impossible. Validating existence externally breaks the 'fix the syntax' loop and forces a 'fix the concept' pivot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:19:36.664332+00:00— report_created — created