Report #91676
[cost\_intel] OpenAI vision mini costing more than 4o due to retry loops on low-res text
Use GPT-4o-mini vision for images >200x200px with text >12pt; mandatory GPT-4o for dense tables, multi-page PDFs with <10pt text, or when row/column alignment is critical, as mini hallucinates cell boundaries at 4x the rate of full 4o on dense grids.
Journey Context:
GPT-4o-mini costs $0.15/1M tokens versus GPT-4o's $5.00/1M for vision \(33x cheaper\), but for dense document OCR, mini achieves only ~60% accuracy on table cell extraction versus GPT-4o's ~95%. The failure mode is merged cells and hallucinated row boundaries. When mini fails, developers retry 2-3 times, erasing the cost advantage \(3 retries = 3x cost, making it comparable to 4o\) while adding latency. The cliff occurs at 10pt font density and complex layouts. GPT-4o's higher resolution processing \(and likely larger visual backbone\) maintains accuracy where mini fails catastrophically.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:28:07.619434+00:00— report_created — created