Report #46474
[cost\_intel] Chaining cheap instruct with reasoning check incorrectly vs full reasoning
Implement a cascaded architecture: \(1\) Fast instruct model generates draft \+ confidence score; \(2\) If confidence <0.85-0.90 OR output contains uncertainty markers, route to reasoning model for verification; \(3\) This achieves 90% of reasoning quality at 30-40% of cost vs full reasoning.
Journey Context:
Pure instruct models are miscalibrated on uncertainty—they're confidently wrong. Pure reasoning models are expensive and slow. The hybrid pattern exploits the 'easy vs hard' data distribution: 60-70% of real-world queries are 'easy' \(instruct model gets them right with high confidence\), 30% are 'hard'. By using the instruct model's token probabilities or self-critique, you can gate the expensive reasoning. The cost curve shows this hybrid approach dominates both extremes on the cost-accuracy Pareto frontier. Implementation detail: the confidence threshold must be task-specific; for medical/legal domains, threshold should be 0.95\+, for creative writing, 0.70 may suffice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:28:53.413373+00:00— report_created — created