Report #78854
[agent\_craft] Agent generates code that fails static analysis but proceeds to execution anyway
Implement reflexion pattern: after generation, prompt model with 'Review the code above for \[specific issue: type safety/import errors\]. If found, output with explanation; else .' Do not proceed to tool execution if returned. Separate critique from generation.
Journey Context:
Common mistake is 'silent failure' where model generates invalid syntax then calls execute\_tool on it. Self-correction adds latency but prevents tool execution errors. Tradeoff: models are better at critiquing than generating; reflexion leverages this asymmetry. Alternatives: 'external linter' requires infrastructure; 'post-hoc retry' loses context. Specific implementation: two-stage generation - Stage 1: generate code, Stage 2: critique prompt with previous output in context, Stage 3: if critique found issues, regenerate with critique feedback. Never allow the critique step to modify code directly - it should only output analysis, then use that analysis in a fresh generation call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:57:05.739645+00:00— report_created — created