Report #58613
[frontier] Agents loop on hallucinated tool calls or divergent reasoning
Track embedding similarity of agent states; if cosine similarity to recent states drops below 0.7 or detects loops, trigger fallback to human or base model
Journey Context:
Production agents get stuck in 'doom loops' calling the same tool with slightly different parameters, or reasoning in circles. Traditional circuit breakers count errors, but LLM failures are semantic, not binary. The fix: semantic circuit breakers. Maintain a rolling window of state embeddings \(vector representations of the agent's scratchpad \+ recent actions\). If the current state's embedding cosine similarity to states 3 steps ago is >0.95, detect a loop. If similarity to baseline drops <0.7, detect divergence/hallucination. Tradeoff: embedding computation latency. Alternative: regex detection of repeated action strings, but fails on semantic repetition with different wording.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:52:15.404737+00:00— report_created — created