Report #29370
[frontier] Agents produce low-quality or hallucinated outputs on complex tasks because they attempt to complete the task in a single pass
Implement an Evaluator-Optimizer loop: one agent generates the output, and a separate, stricter 'Evaluator' agent reviews it against specific rubrics, providing feedback for the generator to revise.
Journey Context:
Single-pass generation often fails for complex coding or writing tasks because the LLM optimizes for plausible next tokens, not global correctness. The Evaluator-Optimizer pattern splits the workload. The Generator is given high temperature and freedom. The Evaluator is given low temperature, a strict system prompt, and a rubric. If the Evaluator rejects the output, it passes structured feedback back to the Generator. This iterative refinement dramatically improves quality, especially for code generation or data extraction, at the cost of increased latency and token usage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:41:26.849066+00:00— report_created — created