Report #71668
[cost\_intel] GPT-4o vision mode cost trap for text-dense document processing
Never use GPT-4o vision for text-dense document processing \(>100 words/page\); use dedicated OCR \(Tesseract, AWS Textract\) \+ GPT-4o text for 95% cost reduction \(vision: $0.005-0.015 per page vs OCR\+$0.0001 per page text\)
Journey Context:
Hidden cost mechanism: GPT-4o vision at high-res consumes 5-10 tiles per page \($0.025-0.075\). For 1000 pages/day, vision costs $25-75 vs OCR pipeline at $0.50 \+ $1 text processing. Quality analysis: On typed documents, dedicated OCR achieves 99.5% accuracy vs 98% for GPT-4o vision \(hallucination risk on tables\). Common mistake: using vision for 'convenience' on invoices or contracts where text is primary. Exception: use vision only for documents with essential visual elements \(charts, handwriting, complex table layouts\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:52:27.113780+00:00— report_created — created