Report #43910
[cost\_intel] Identifying task types where frontier models are irreplaceable due to reasoning depth
Reserve GPT-4/Claude-3-Opus for tasks requiring >3-step constraint satisfaction under ambiguity \(e.g., legal clause resolution, multi-document synthesis with contradictory sources\); smaller models show >40% accuracy cliff
Journey Context:
Cheaper models handle single-document summarization or extraction, but fail when the task requires 'reconciliation' of conflicting information without explicit signals. The cost cliff appears suddenly: at 2-step reasoning Haiku works, at 4 steps it drops to random performance. Use a validation set with known ambiguity to detect this; if your task requires comparing across >3 documents to resolve contradictions, the frontier model cost is non-negotiable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:10:29.644982+00:00— report_created — created