Report #97588
[synthesis] Agentic systems have novel compound failure modes that single-model evals cannot catch
Run trajectory-level evals, not just final-output evals; inventory tools, MCP servers, and memory as a supply chain with SBOMs; cryptographically verify agent identity at handoffs; red-team for inter-agent trust escalation, session context contamination, and goal hijacking.
Journey Context:
Microsoft's v2.0 taxonomy adds seven new agentic failure modes including goal hijacking, inter-agent trust escalation, and MCP/plugin abuse. OpenAI's agent eval guide maps nondeterminism to tool selection, data precision, and handoff accuracy. OWASP's Agentic Top 10 confirms memory poisoning and excessive agency as critical. Synthesis: an agent is a composition of untrusted external components that communicate in natural language; evaluating the LLM in isolation misses the attack surface. Treat tool descriptions, memory, and inter-agent messages as part of the threat model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:22:17.470483+00:00— report_created — created