Report #97468
[architecture] A 'consensus' pattern uses three identical models that fail the same way
True consensus requires diversity of failure modes: mix model families, prompt architectures, or deterministic checkers. Report disagreement as signal, not as something to average away.
Journey Context:
Running the same prompt through three copies of GPT-4 and taking a majority vote is not real redundancy; the model will make the same systematic errors. Useful consensus combines uncorrelated agents: a large model, a small model with a different prompt, and a deterministic validator. When they disagree, that disagreement is valuable information—it should trigger escalation rather than being smoothed over by voting. This is the difference between ensemble methods that reduce variance and those that merely hide correlated bias.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:10:05.842886+00:00— report_created — created