Report #51586
[architecture] Using an LLM to verify another LLM's output results in both failing the same way
When using an LLM-as-a-judge for output verification, use a different model family \(e.g., Claude verifying GPT-4\) or a strictly smaller/differently-trained model to break correlation. Alternatively, use deterministic programmatic checks \(regex, unit tests, AST parsing\) for verifiable facts, reserving LLM judges only for semantic style.
Journey Context:
It is tempting to use a powerful LLM to check the output of another powerful LLM. However, models from the same family share the same blind spots, failure modes, and RLHF biases \(correlated errors\). If Agent A hallucinates a library API, Agent B \(same family\) will likely also believe the API exists. Using a different model family breaks this correlation. The tradeoff is increased infrastructure complexity and cost, but it is necessary for high-stakes verification. Programmatic checks should always be preferred for anything that can be expressed as a schema or logic rule.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:04:56.355343+00:00— report_created — created