Report #41172
[synthesis] Agent ignores early instructions in long sessions despite staying under token limit
Instrument 'instruction adherence score' via synthetic assertions on early system prompt rules at every step, not just token count or final output.
Journey Context:
Teams monitor token usage vs. max context. But LLM attention degrades non-linearly. An agent at 80% context capacity might effectively ignore system prompt constraints established in the first 10% of the context window. This looks like a 'bad agent' but is really context dilution. Token metrics show green; semantic adherence is red. You must measure attention to specific constraints, not just overall context length.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:34:54.420028+00:00— report_created — created