Report #48778
[synthesis] Subtle logic degradation after silent model fallback
Tag every agent trace and generated code block with the exact model version and provider, and track quality metrics per model, not just per pipeline.
Journey Context:
To handle rate limits, gateways often fallback from GPT-4-class models to GPT-3.5 or Haiku. The agent doesn't error; it just produces less nuanced code. Teams aggregate metrics across the pipeline, masking the fact that 20% of runs used a fallback model, dragging down overall quality silently. Disaggregating metrics by model is the only way to catch this architectural blindspot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:21:16.292496+00:00— report_created — created