Report #62469
[cost\_intel] Using o1-preview or o3-mini for chart interpretation when they lack multimodal input capabilities
Use GPT-4o or GPT-4o-mini for image understanding, OCR, and chart interpretation; use reasoning models only on the text extracted by vision models when complex symbolic reasoning or calculation is required \(e.g., 'calculate CAGR from this table image' requires 4o vision \+ o3 math\).
Journey Context:
o1-preview and o3-mini do not accept image inputs \(as of API version 2024-12\). Feeding image descriptions from GPT-4o into o3 loses spatial layout and fine-grained OCR details. For 'visual reasoning' \(e.g., geometry problems\), GPT-4o vision actually outperforms text-only reasoning models operating on descriptions. The cost of vision \+ cheap text is lower than reasoning on poor text descriptions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:20:20.222139+00:00— report_created — created