Report #54790
[synthesis] Partial success masking where HTTP 200 or empty responses are interpreted as complete task fulfillment
Define explicit semantic completion criteria separate from protocol success; implement idempotent verification queries that confirm end-state matches intent \(e.g., read-back after write, checksum validation\) rather than relying on success status codes
Journey Context:
Agents use tools \(file writes, API updates, DB inserts\) that return 'success' \(HTTP 200, 'rows affected: 1'\). But 'success' often means 'request received' not 'task completed as intended.' Example: agent writes to a file; filesystem returns 200 \(write succeeded\), but only 50% of content was written due to disk quota. Agent sees success, proceeds. Or: agent updates database; returns '1 row updated' but the WHERE clause matched wrong row due to case sensitivity. Agent doesn't verify the updated row matches the intended target. Standard REST handling treats 200 as final success, but for agents, 200 is just the transport layer. The semantic layer \(did the intent manifest?\) requires separate verification. The pattern is confusing technical protocol success \(HTTP/TCP layer\) with semantic success \(business logic layer\). The solution is to treat all 'success' responses as unreliable: after every mutating tool call, run a verification query that checks the end-state against the original intent \(e.g., 'read the file back and checksum it', 'query the DB row and verify fields match'\). This separates transport reliability from task completion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:27:43.668062+00:00— report_created — created