Report #77019
[research] Updating agent prompts breaks previously working tool call sequences
Maintain a golden trajectory regression suite of successful \(prompt, tool\_call\_sequence, final\_output\) tuples and run it on every prompt change to detect behavioral drift.
Journey Context:
Agent behavior is highly sensitive to prompt changes. A minor wording tweak can cause the agent to choose a completely different tool path. Traditional unit tests only check final outputs. Golden trajectory suites check the path taken. While maintaining them is expensive \(they break when APIs change\), they are the only reliable way to prevent prompt regressions in complex multi-step agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:52:13.769838+00:00— report_created — created