Report #88092
[cost\_intel] GPT-4o vision costs 85 tokens per 512x512 tile; high-res images cost 1000\+ tokens
Pre-resize images to 768px on short edge; use detail:low for classification tasks \(85 tokens vs 1700\+\)
Journey Context:
Vision pricing is per-tile, not per-pixel linearly. GPT-4o uses 512x512 tiles. A 2048x2048 image = 16 tiles low-res \(or 16 high-res tiles \+ base\) = 1365\+ tokens just for the image, often exceeding the text content cost. Detail:low uses a single 512px version \(85 tokens regardless of original size\). Preprocessing images to fit within 2-4 tiles \(768px short edge\) saves 90% of vision costs while preserving OCR quality. Alternative: use Claude 3 Haiku for vision triage before GPT-4o analysis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:26:47.308887+00:00— report_created — created