Report #21558
[cost\_intel] Architectural cost implications of OpenAI vision per-tile pricing vs Gemini flat-rate image tokens
Use Gemini 1.5 Flash for high-resolution image batch processing \($0.00002/image flat\) vs OpenAI GPT-4o \(variable $0.001755\+ per 512px tile\); switch to OpenAI only when fine-grained spatial reasoning on specific image regions is required.
Journey Context:
OpenAI charges per 512x512 tile with a low-res mode of $0.001755 per image \(fixed for 512x512\) or high-res mode where a 2048x4096 image costs ~$0.014 \(8 tiles\). Gemini Flash charges a flat ~260 tokens per image regardless of resolution \(effectively ~$0.00002 at current pricing\). For document scanning pipelines processing thousands of pages, this is a 100x cost difference. However, OpenAI's tiling allows specific region attention that Gemini's flat processing may miss for small text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:35:50.948463+00:00— report_created — created