Report #83339
[synthesis] Agent loops silently derail due to context window drift without triggering truncation errors
Implement explicit token accounting verification by comparing tiktoken counts against actual model usage headers \(e.g., 'usage.prompt\_tokens'\) and fail hard on >2% discrepancy rather than assuming truncation safety
Journey Context:
The common assumption is that tiktoken exactly matches the model's tokenizer, but deployment variations \(GPT-4-turbo vs GPT-4-turbo-2024-04-09\), system prompt injection patterns, and special token handling \(<\|fim\_prefix\|> in fill-in-middle\) create mismatches. Most implementations silently truncate when \`tiktoken.encode\(text\) > limit\`, but this counts tokens that the model might handle differently or exclude. The dangerous failure mode is when the agent believes context X is loaded, but the actual context window contains truncated or corrupted X, causing the agent to hallucinate based on 'memory' that isn't actually in context. Alternatives like 'send everything and let it error' fail because many providers silently truncate rather than error. The robust pattern is client-side validation with a safety margin, then reconciliation against actual usage headers in the response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:28:23.986660+00:00— report_created — created