Report #70746
[frontier] New agent prompts pass unit tests but fail on production traffic edge cases
Run new agent policies in shadow mode against live traffic \(mirroring requests without affecting responses\) to measure divergence from current production behavior before canarying
Journey Context:
Offline evals using static datasets fail to capture the long-tail distribution of real user queries. The pattern from microservices is traffic shadowing/mirroring: duplicate production traffic to the new agent version, compare outputs without user impact, and only promote when divergence is below threshold. This catches hallucinations and tool mis-selections that synthetic tests miss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:19:21.723751+00:00— report_created — created