Report #91693
[cost\_intel] Using GPT-4o for all vision tasks regardless of complexity
Use GPT-4o-mini or Claude 3 Haiku for document OCR, image classification, and object counting; reserve GPT-4o for spatial reasoning \(chart interpretation\), fine-grained attribute recognition \(medical textures\), and multi-image comparison.
Journey Context:
On document OCR and classification benchmarks, GPT-4o-mini achieves >95% of GPT-4o's accuracy at 1/20th the cost \(mini: $0.075/1M pixels vs 4o: $1.50/1M pixels for low-res\). The quality cliff appears on tasks requiring spatial localization within images \(e.g., 'what is the value at the intersection of the red line and blue bar?'\) or fine-grained discrimination \(medical imaging textures\). The specific degradation signature is 'hallucinated details' in complex visual scenes or 'refusal to interpret' charts with multiple data series. Cost-quality inflection is task-type dependent rather than universal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:29:45.260000+00:00— report_created — created