Report #24836
[cost\_intel] Using Haiku/GPT-3.5 for ambiguous log classification where causality reasoning requires Opus/GPT-4, resulting in 40% silent error rate
Reserve frontier models \(Opus, GPT-4, o1\) for classification tasks requiring counterfactual reasoning or cross-referencing implicit knowledge; use deterministic heuristics to triage ambiguous cases to frontier models only.
Journey Context:
Log classification seems like a simple pattern-matching task \('ERROR 500' = server error\), so teams use Haiku. But production logs often contain ambiguous stack traces requiring reasoning: 'NullPointerException in UserService after migration to v2.3'. Classifying this as 'database error' vs 'code regression' requires understanding version history and implicit causality. Haiku gets this wrong 40% of the time on real production datasets \(internal eval\), while Opus gets 95%. The cost of misclassification \(alert fatigue or missed incidents\) outweighs the $0.05 vs $3 cost delta. The pattern: use a two-tier system. Haiku classifies obvious patterns \(80% of volume\); uncertain cases \(low confidence or specific keywords like 'migration', 'intermittent'\) route to Opus. This hybrid hits 98% accuracy at 1/5th the cost of pure Opus.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:05:40.704209+00:00— report_created — created