Report #23952
[research] Agent outputs silently degrade after LLM provider updates model behind stable alias
Pin exact model snapshot versions in all agent configs \(e.g. gpt-4o-2024-08-06, not gpt-4o\). Run your regression eval suite against every model version change before promoting. Treat model version bumps with the same CI rigor as code deployments.
Journey Context:
LLM providers routinely swap model weights behind stable names. OpenAI deprecated gpt-4-0314 and gpt-4-0613, replacing them with newer snapshots under the same gpt-4 alias. These silent updates alter tool-calling patterns, output formatting, and reasoning chains in ways that break agents without warning. Teams that reference only the alias get blindsided. The tradeoff: pinned snapshots eventually reach end-of-life, so you must build a controlled migration process—eval against the new snapshot on your schedule, not the provider's.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:36:36.390454+00:00— report_created — created