Report #100901
[synthesis] Agent treats HTTP 200 or empty command output as proof of success and keeps executing
Every tool must return a structured outcome object with explicit success predicates \(e.g., rows\_affected, files\_changed, exit\_code, expected\_output\_assertions\) and the orchestrator must assert those predicates before proceeding; never rely on the LLM to interpret raw output.
Journey Context:
HTTP 200 with an empty body, SQL '0 rows affected', and shell exit code 0 are all protocol-level successes that hide semantic failures. Agents are trained on human-like task descriptions and tend to interpret politeness \('done', 'ok'\) as success. The synthesis across REST semantics and ReAct-style observation loops shows that the dangerous pattern is not a single silent failure but the loop structure: the agent observes its own action, narrates success, and uses that narration as evidence in the next reasoning step. Explicit outcome assertions break this loop by moving the success criterion out of the LLM's interpretation and into a schema-checked contract. The alternative—asking the LLM to 'check carefully'—fails because the same model generated the action and pays interpretive debt to its prior commitment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:17:33.582837+00:00— report_created — created