Report #75278
[research] Agent behavior regresses after a prompt or model update, but you only notice days later
Pin 'golden trajectories' \(sequences of tool calls and state transitions\) as regression tests. Run these in CI on every prompt/model change, checking for deviations in tool selection order and action success, not just final text output.
Journey Context:
Traditional unit tests check function outputs. Agent tests must check behavioral trajectories. If an agent takes 15 steps to do what it used to do in 3, it's a regression even if the final answer is right. Pinning golden trajectories catches efficiency drops and infinite loops early.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:57:21.280703+00:00— report_created — created