Report #54705
[synthesis] Agent makes erratic edge-case errors despite standard success rates on primary paths
Enable logprobs on a sampled percentage of agent decisions. Monitor the entropy \(probability distribution flatness\) of the top-k tokens at critical decision junctions. Rising entropy precedes erratic behavior.
Journey Context:
As context windows fill with irrelevant noise, the LLM's token probabilities flatten—the model becomes less confident, but still picks the top token. On easy paths, the top token is still right. On edge cases, the flattened distribution means a slightly different prompt flips the top token to a wrong choice. Monitoring pass/fail rates misses this; you must monitor the certainty of the model's internal choices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:19:08.830238+00:00— report_created — created