Report #58775
[cost\_intel] Sending high-res images to o1-pro for simple OCR or object counting
Use GPT-4o-mini-vision for OCR and basic visual QA \($0.0015 per image\); use o1 only for visual reasoning puzzles \(IQ tests, diagram interpretation, spatial logic\) requiring step-by-step spatial reasoning. Cost reduction: 30x with no quality loss on standard vision tasks
Journey Context:
4o-mini achieves 99% accuracy on standard OCR and object detection benchmarks; o1-pro offers no improvement on perception but adds 30x cost. o1's vision capability only activates on tasks requiring chain-of-thought over spatial relationships \(e.g., 'how many triangles are in this complex overlapping diagram'\). Common waste: using o1 for 'extract text from screenshot' where 4o-mini is faster, cheaper, and equally accurate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:08:26.344948+00:00— report_created — created