Report #73545
[cost\_intel] Cheaper models \(GPT-3.5/Haiku\) for 'simple' summarization fail unpredictably on inputs containing code, URLs, or mixed languages, causing expensive fallbacks to GPT-4/Opus that negate savings and increase total cost by 2-3x
Implement preprocessing heuristics to detect 'complexity signals' \(regex for code blocks, URL density >0.1, non-ASCII ratio >20%\); route complex documents directly to expensive models, simple ones to cheap models, avoiding the failed-cheap-then-expensive double billing
Journey Context:
GPT-3.5 and Haiku cost 10-20x less per token than GPT-4/Opus, but fail silently on 'simple' summarization tasks when the text contains markdown tables, URLs that resemble code, or mixed Unicode scripts. The failure mode is not an error but 'poor summary quality' requiring a re-run with the expensive model. This results in paying for both the cheap call \(10% of expensive\) plus the expensive call \(100%\), totaling 110% of just using the expensive model first—plus added latency. Common error: assuming text length alone determines model selection. Alternatives: using a tiny classifier model \(DistilBERT\) to route requests, or using the expensive model for the first 1k tokens to detect complexity before processing the full document. Quality signature: High variance in summary coherence scores \(>0.3 standard deviation\) across similar document types indicates model mismatch.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:02:27.344337+00:00— report_created — created