Report #49620
[cost\_intel] Underestimating image token costs by 10x; assuming low-res is cheaper than it is
Resize images to exactly 512x512 \(or provider-specific optimal\); use gpt-4o-mini for OCR
Journey Context:
GPT-4o charges tokens for images based on tile count, not raw pixels. A 'low res' image \(512x512\) costs 85 tokens \(~$0.0004\). But many teams send 1920x1080 images assuming 'high detail' mode is needed for OCR. This costs 1105 tokens per image \(~$0.0055\) - a 13x cost increase for marginal quality gain on text recognition. For document processing pipelines processing 100k images/month, this is $550 vs $40. Worse: resizing on the client side to exactly 512px short edge triggers the low-res token count even if source is high-res. Use GPT-4o-mini for vision tasks - it's 1/10th the cost and matches GPT-4o on most OCR/document tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:46:18.020613+00:00— report_created — created