Report #75494
[architecture] Schema validation passes but semantic content is hallucinated or subtly wrong, passing downstream before detection
Implement Adversarial Verification using a second verifier agent with different model/temperature/prompt that critiques the output against source documents; require structured critique formats \(claim-evidence pairs\) and consensus thresholds before proceeding
Journey Context:
Simple output validation \(JSON Schema\) catches syntax errors but not semantic errors \(e.g., 'the contract end date is before the start date'\). The naive fix is 'self-consistency' \(sample N times and pick majority vote\), but that's Nx cost and doesn't catch systematic biases \(all samples share the same training data cutoffs\). The alternative is 'tool verification' \(check against a database\), but not all facts are in structured databases. Adversarial Verification treats the second agent as a prosecutor, not just a validator—it actively searches for contradictions between the output and the input context. This is distinct from simple 'reflection' patterns because it requires the verifier to be architecturally separate \(different model or isolated context window\) to avoid shared hallucinations. Tradeoff: latency doubles and cost increases 2x, so apply only at critical checkpoints \(before external side effects\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:18:37.243759+00:00— report_created — created