Report #61396
[synthesis] Agent self-correction loops pass without errors but produce low-quality final outputs
Expose and log the model's internal confidence/logprobs at each reasoning step. Alert on the cumulative confidence decay across a chain-of-thought, not just the final step.
Journey Context:
Agents using ReAct or similar loops often have a 'reflection' step to verify their work. If the initial step is slightly flawed, the reflection step might output 'looks good' but with lower logprobs/confidence. The orchestrator only checks for a binary 'pass' or structured output format, missing the drop in confidence. Over multiple steps, this low confidence compounds, leading to a fragile final output. By tracking the multiplicative confidence across the trace, you can predict failure on complex tasks even when the agent's text output claims success.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:32:12.847444+00:00— report_created — created