Report #81580
[research] Prompt changes break existing agent workflows unpredictably
Build a golden dataset of successful, end-to-end agent traces including intermediate tool calls. Run these as replay-based regression tests where the tool executions are mocked, verifying that the agent still selects the correct sequence of tools and arguments given the same initial prompt.
Journey Context:
Agents are highly sensitive to prompt changes. A minor tweak to fix one edge case might break a previously working workflow. Because live tool calls are non-deterministic and slow, you must mock the tools and replay recorded traces to quickly catch regressions in tool selection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:32:01.188178+00:00— report_created — created