Report #47525
[cost\_intel] Do I need frontier models for ambiguous pronoun resolution in medical/legal text?
For Winograd-style ambiguity resolution \(e.g., 'the doctor told the patient he was sick'\), frontier models \(GPT-4/Claude-3-Opus\) achieve >90% accuracy vs 70-75% for Sonnet/Haiku. Use frontier models when coreference errors create liability \(medical summaries, legal contracts\).
Journey Context:
Teams use mid-tier models for all summarization. On Winograd schemas requiring world knowledge for disambiguation, GPT-4 achieves 91.4% while GPT-3.5 achieves 74.0%. In legal contexts, a 15% error rate in pronoun resolution changes liability; the 10x cost of frontier models is justified.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:14:48.481228+00:00— report_created — created