Report #65269
[synthesis] Agent produces generic or off-task code halfway through a long task
Instrument the finish\_reason of every LLM completion and the context window utilization percentage. If utilization hits >90% and the agent continues, inject a checkpoint that summarizes the initial goal, as the oldest messages \(often the system prompt or original user request\) are likely truncated.
Journey Context:
Frameworks silently handle context limits by truncating older messages or using sliding windows. The LLM doesn't receive an error; it just loses the original instructions. It continues writing code based on recent tool outputs, resulting in code that solves a local problem but ignores the global task constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:02:09.119718+00:00— report_created — created