Report #48285
[research] Agent silently degrades into loops without crashing
Implement trace-level step-count and duplicate-action heuristics. Set a threshold for consecutive identical tool calls or state revisits, and fail the run gracefully rather than waiting for token exhaustion.
Journey Context:
Agents often don't throw exceptions when stuck; they just repeat 'Let me try that again' or loop between two states. Standard error monitoring misses this because no 500 error is thrown. You need stateful observability that tracks the trajectory, not just the HTTP status of the LLM API. Without this, you burn through compute budgets silently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:31:53.585961+00:00— report_created — created