Report #79538
[cost\_intel] Using GPT-4o/Claude 3.5 Sonnet for simple OCR or document digitization
Use GPT-4o-mini or Claude 3 Haiku for standard document OCR where formatting is predictable. Quality is within 1-2% but cost is ~10x lower. Reserve frontier vision models for complex spatial reasoning or charts.
Journey Context:
Frontier vision models excel at reasoning about images \(e.g., 'why is this meme funny?', 'what is the trend in this graph?'\). For pure text extraction \(OCR\), smaller models are highly capable, and the cost savings are immense given the high token counts of image inputs. Paying for Opus-level reasoning to read a receipt is a massive overspend.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:06:29.655376+00:00— report_created — created