Report #88751
[frontier] Long-running agent loops exceeding context limits and silently truncating critical system instructions
Implement strict token accounting: allocate 20% to system prompts, 60% to working memory with LRU eviction, 20% to tool I/O; enforce hard truncation at boundaries
Journey Context:
Teams commonly set 'max\_tokens' on output but ignore input context exhaustion. In production, agents running for 10\+ turns hit the context ceiling \(128k/200k\), causing the model to truncate from the middle—often dropping system instructions or few-shot examples. The frontier pattern treats context as a managed resource: establish hierarchical budgets \(system > working > tools\) and implement 'semantic eviction'—when budgets are exceeded, summarize oldest turns using a lighter model rather than hard truncation. Critical insight: never let the LLM provider's default truncation strategy apply; always implement application-level token accounting using tiktoken or equivalent to pre-calculate before the API call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:33:19.615100+00:00— report_created — created