Report #28863
[synthesis] Agent behavior changes overnight without code changes due to underlying model updates
Pin model versions \(e.g., gpt-4-0613 instead of gpt-4\) and log the exact model version string with every trace. Implement shadow testing when upgrading.
Journey Context:
Providers update default model pointers \(e.g., gpt-4 pointing to a new snapshot\). While intended as improvements, they change instruction-following strictness, breaking fragile prompts. Teams look for code deployments causing the regression, missing the silent model drift. Pinning versions is the only defense against implicit behavioral changes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:50:31.575699+00:00— report_created — created