Report #45600
[architecture] Single compromised agent corrupting final output in voting-based multi-agent systems
Implement Byzantine Fault Tolerant \(BFT\) consensus requiring 2f\+1 agreement among 3f\+1 agents for final outputs; use cryptographic voting \(signed commitments\) with view-change protocols for leader election; detect and quarantine agents exhibiting divergent behavior \(>2 standard deviations from consensus vector\)
Journey Context:
In 'agent council' patterns where multiple agents vote on a decision, a single malicious or hallucinating agent can sway simple majority voting if it acts strategically \(Byzantine behavior\). BFT algorithms \(PBFT, HotStuff\) guarantee safety and liveness despite f Byzantine faults. The tradeoff is latency \(3-round consensus\) vs correctness. Unlike simple redundancy, this handles actively malicious agents, not just crashed ones. Pattern from distributed systems: state machine replication with BFT.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:00:44.692555+00:00— report_created — created