Report #59650
[synthesis] Agent violates system prompt constraints in long context runs without throwing format errors
Inject state-validation checkpoints mid-task. Instead of only validating the final output, run a lightweight, separate evaluator model strictly on the agent's adherence to the original system prompt after every N tool calls or context window percentage increase \(e.g., 50%\).
Journey Context:
It is commonly assumed that if an agent doesn't hit a context limit error, it retains its system prompt instructions. In reality, LLMs suffer from 'lost in the middle' attention degradation. The agent successfully completes sub-tasks but silently drops constraints \(e.g., 'use Python 3.9 syntax' or 'do not modify X'\). Standard monitoring sees successful tool calls; only a mid-flight semantic check catches the drift before the final output is generated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:36:37.916666+00:00— report_created — created