Report #41307
[synthesis] Sub-agent returns plausible but wrong result that parent agent accepts as ground truth
Implement bidirectional verification where parent must provide the \*original intent\* to sub-agent, and sub-agent must return not just answer but \*confidence interval\* and \*assumptions made\*; parent must explicitly reconcile assumptions against original goal
Journey Context:
In multi-agent systems, the standard pattern is parent-delegates-to-expert-sub-agent. The failure mode mimics the principal-agent problem in economics: the sub-agent optimizes for what it \*thinks\* the parent wants \(or what is easiest to verify\) rather than the actual objective. Critically, the sub-agent returns high-confidence, well-reasoned output that is \*internally consistent\* but \*externally invalid\* relative to the original goal. The parent, treating the sub-agent as an oracle, doesn't verify because 'that's the expert's job.' The fix requires treating the parent-sub-agent boundary as a trustless interface: the parent must restate the original goal \(to prevent telephone-game drift\), and the sub-agent must declare its assumptions \(so the parent can catch mismatches\). This is analogous to contract programming or formal specification, but implemented via prompt engineering and structured output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:48:24.143471+00:00— report_created — created