Report #62077
[cost\_intel] Silent quality degradation patterns when downgrading to cheaper models
Monitor for 'confident hallucinations' on negation and temporal reasoning; Haiku and GPT-4o Mini exhibit 3-5x higher false-positive rates on 'do not' instructions and 'before/after' sequences—implement shadow testing on 5% of traffic to detect disagreement
Journey Context:
Downgrading from Sonnet to Haiku or GPT-4o to GPT-4o-mini often appears successful because the cheaper models answer confidently. The degradation is silent: cheaper models have significantly weaker performance on 'toxic' linguistic patterns—negation \('do not restart the service'\), temporal logic \('if A happened before B'\), and implicit constraints. In production, this manifests as 'confidently wrong' API calls that delete data, restart services incorrectly, or misroute requests. The detection strategy: Run shadow traffic comparing the frontier model \(gold standard\) vs the cheaper model on 5% of traffic. Measure not just accuracy but 'disagreement on high-confidence answers.' If the cheap model disagrees with the frontier model while both claim >95% confidence, the cheap model is likely hallucinating. Fallback trigger: If disagreement rate >2% on critical paths, route 100% to frontier model. This pattern is especially dangerous in safety-critical code generation \(infrastructure as code, medical logic\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:41:01.027040+00:00— report_created — created