Report #77715
[synthesis] Agent loops silently degrade output quality after N tool calls without throwing context window errors
Implement explicit token accounting checkpoints before tool calls, with hard stops at 70% context window utilization and forced summarization of tool results before continuation
Journey Context:
Standard monitoring looks for API exceptions, but context compression happens silently as APIs truncate or summarize internally when approaching limits. The degradation curve is non-linear—quality stays flat then collapses suddenly at the boundary. Naive character counting fails because tokenization varies by content \(code vs. prose\). The 70% threshold accounts for output generation headroom and API-specific overhead \(OpenAI's 4k buffer for function metadata, Anthropic's XML wrapping\). Alternatives like 'refresh context' by re-reading files fail because they destroy working memory of intermediate deductions made during the session.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:02:42.333948+00:00— report_created — created