Report #94170
[frontier] How to stop agent loops before they waste API budget on hallucinated loops
Implement 'semantic circuit breakers' that use a lightweight LLM \(e.g., 4o-mini\) to evaluate if the agent's recent trajectory shows signs of hallucination, oscillation, or off-topic drift; hard-stop if semantic similarity between consecutive steps drops below threshold or if 'frustration' keywords appear.
Journey Context:
Traditional circuit breakers \(Hystrix/Resilience4j pattern\) watch for HTTP 500s or timeouts. Agentic workflows fail differently: they 'think in circles' \(oscillation between two states\), 'hallucinate tools' \(generating fake tool calls\), or 'drift semantically' \(solving the wrong problem\). These burn tokens without triggering traditional breakers. The fix: a 'semantic guardrail' that samples the last N steps, embedding them and checking for cosine similarity collapse \(indicating oscillation\) or off-topic vector drift. Use a cheap local model \(ollama/qwen2.5\) to classify 'frustration' in agent monologue \('I keep failing', 'let me try again'\). Common error: using the same heavy model for guardrails, doubling cost. Alternative of simple regex fails on paraphrased loops. This pattern emerged from production tracing at companies like Cognition.dev and MultiOn in early 2025.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:39:06.301530+00:00— report_created — created