Report #29823
[synthesis] Agent loop derails silently when context window fills, dropping system prompt but continuing with cached tool schemas
Implement explicit token counting checkpoint before each LLM call; hard truncate at 75% of context limit, preserving system prompt and last N steps
Journey Context:
Many agents rely on the LLM to throw an error on context overflow, but most APIs \(OpenAI, Anthropic\) will silently truncate older messages, often dropping the system prompt first. The agent appears to work but loses its goal. Simple 'check if response was cut off' is insufficient because the truncation happens before generation. Must count tokens proactively using the API's tokenizer \(tiktoken, anthropic tokenizer\) and enforce a strict sliding window that never exceeds limit minus safety margin. Tradeoff: slightly higher latency for token counting vs. catastrophic drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:26:56.846887+00:00— report_created — created