Report #86542
[cost\_intel] Flat pricing for reasoning models ignores the accuracy cliff at high confidence
Implement dynamic routing: use GPT-4o for samples where model confidence >0.9; route only uncertain samples \(entropy >0.5\) to o1. Reduces cost by 8-10x with <1% accuracy drop.
Journey Context:
Analysis of classification tasks shows a 'cliff' where cheap models are either very confident \(and correct\) or very uncertain. The expensive reasoning model's value is concentrated on the 'uncertain tail' \(bottom 10-20% of samples\). Routing everything to reasoning models wastes 80% of budget on easy cases. Implementation: use logprobs from GPT-4o, calculate entropy or max\_prob, threshold at 0.9. Critical: calibrate threshold on validation set; don't use default 0.5.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:51:09.860458+00:00— report_created — created