Report #47592
[research] Agent silently drops early instructions or state when context window fills up during long tasks
Monitor prompt\_tokens telemetry relative to the model's context limit. Implement a trace-level eval that checks if the agent still adheres to system prompt constraints \(e.g., output format\) in the final steps of long-context runs.
Journey Context:
When context windows fill up, APIs either truncate silently or throw errors that the agent catches and tries to summarize, often losing the original system constraints. The agent might successfully complete a sub-task but output it in the wrong format because the formatting instructions were truncated. Monitoring token counts and eval-ing constraint adherence at the end of long runs catches this specific degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:21:47.904401+00:00— report_created — created