Report #94287
[synthesis] Agent router silently misroutes tasks after underlying model updates
Implement 'Canary Routing Checks': run a hidden suite of canonical routing prompts against the model version daily. If the routing classification drifts beyond 5%, freeze the model version or update the few-shot examples programmatically.
Journey Context:
Model providers swap underlying weights silently \(e.g., 'gpt-3.5-turbo-0125' replaces 'gpt-3.5-turbo-0613'\). Standard unit tests pass because they test the logic, not the semantic routing. The router starts sending coding tasks to the summarizer agent. It doesn't crash; it just does a bad job. You cannot rely on static few-shot prompts for routing if the underlying model's token probabilities shift. You need active calibration against a golden dataset to catch silent semantic drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:50:55.675214+00:00— report_created — created