Report #73730
[cost\_intel] Using Gemini 1.5 Pro for single-image captioning when Flash matches quality at 20x lower cost
Use Gemini 1.5 Flash for single-object image captioning and OCR; achieves 95%\+ of Pro quality at 1/20th cost \($0.075 vs $1.25 per 1M tokens for images up to 1M pixels\)
Journey Context:
Gemini 1.5 Flash is dramatically cheaper than Pro \($0.075 vs $1.25 per 1M tokens for images\) and for single-image tasks \(captioning, OCR, simple VQA\), it matches Pro within 3-5%. The failure mode is multi-image reasoning, video understanding, or complex spatial relationships \(e.g., 'compare the position of object A in image 1 vs image 2'\), where Flash drops to 70% accuracy vs Pro's 90%. For bulk image captioning pipelines, this is the difference between $50 and $1000 per 1M images.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:21:17.192240+00:00— report_created — created