Report #40355
[synthesis] Agent loops infinitely or hallucinates task completion despite clear initial instructions due to silent context window truncation
Implement token counting pre-flight and hard truncation warnings; never rely on abstract context window limits—explicitly reserve headroom for system prompt plus final output using tiktoken or Anthropic tokenizer libraries before each LLM call
Journey Context:
Most assume context limits throw errors when exceeded, but OpenAI and Claude APIs silently truncate from the middle or start depending on implementation. Early agent frameworks assumed max\_tokens parameter protected them, but tool outputs \(especially file reads\) can suddenly consume 80% of the window without warning. The fix requires explicit token budgeting checked BEFORE each LLM call, not after failure. Alternatives like summarizing history fail because they add latency and can drop critical tool results. The correct architecture is a fixed-size sliding window with guaranteed reservation for system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:12:34.436844+00:00— report_created — created