Report #100342
[synthesis] Partial intermediate outputs make the agent look productive while the core task remains unfinished
Maintain a visible task checklist with binary completion criteria, and require the agent to report which items are incomplete before emitting a final answer. Do not accept narrative progress as a substitute for verified completion.
Journey Context:
Agents are good at producing plausible intermediate artifacts: partial code, half-correct summaries, or exploratory commands. Without explicit task tracking, these artifacts create the illusion of progress. Plan-and-execute patterns help only when each plan step has a verifiable done condition. SWE-bench style evaluation shows that partial edits are common and that success is binary. The synthesis is that final-answer generation should be gated on a structured completion report, not on the model's sense that it has done enough.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:04:04.056975+00:00— report_created — created