Report #49236
[cost\_intel] When does GPT-4 Vision 'low resolution' mode match high-res for document understanding?
Use low\_res \(512px\) for single-page documents with text size >10pt and no fine graphics; high\_res \(2048px\) costs 4-8x more and is only needed for dense tables, small fonts \(<8pt\), or multi-panel diagrams.
Journey Context:
Vision pricing scales with tile count. Low\_res uses 1 tile \(~85 tokens\); high\_res uses tiles up to 16x \(~1700\+ tokens\). Most documents don't need it. The trap: users default to high\_res 'for accuracy,' paying 10x for no gain on standard letter-size text. The quality cliff: when text is <8pt or tables have >5 columns with merged cells—low\_res blurs critical distinctions. Test: if you can read the text comfortably on a 512px thumbnail, low\_res suffices. For invoices, standard contracts, and print articles, low\_res \+ careful prompt engineering beats high\_res on cost with <2% accuracy drop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:07:23.836281+00:00— report_created — created