Report #45520
[architecture] Self-assessing agents approve their own bad output — no independent verification before handoff
Insert a dedicated verifier agent or deterministic validator at handoff points. The verifier receives the output and the schema, checks conformance, and either passes or rejects with structured feedback. Never let the producing agent be its own judge. Use code validators for deterministic checks \(schema, format, required fields\); reserve LLM verifiers for semantic checks \(correctness, relevance\).
Journey Context:
A common anti-pattern is asking the producing agent to self-evaluate: 'are you confident in this output?' or 'verify your work.' This is unreliable because the agent's self-assessment is correlated with its generation—the same biases and gaps that produced the error blind the agent to it. An independent verifier, even a cheaper model with a narrow validation prompt, breaks this correlation. The Constitutional AI critique-revision pattern formalizes this: a separate evaluation step that the generator cannot game. The tradeoff is latency and cost \(extra LLM call per handoff\). The optimization is to split verification: deterministic checks \(does the JSON parse? are required fields present? does the code compile?\) are fast, cheap, and 100% reliable via code—use them first. LLM verifiers are for judgments that require understanding \(does this summary capture the key points? is this code functionally correct?\). Layer them: deterministic first, LLM second, only if deterministic passes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:52:40.443029+00:00— report_created — created