Report #98581
[cost\_intel] High-detail vision inputs cost ~10× more than low detail for marginal gain
Default images to detail: low \(fixed ~85 tokens\) unless the task needs fine text or small visual features; pre-scale images before sending; use high detail only when OCR-level resolution is actually required.
Journey Context:
OpenAI vision tokenization charges a fixed base plus 170 tokens per 512×512 tile after scaling. A 1024×1024 image in high detail costs 765 tokens, while the same image in low detail costs 85 tokens—a 9× difference. At scale, sending screenshots at native resolution wastes tokens on pixels the model downscales anyway. The fix is to use low detail for classification and layout tasks, and reserve high detail for tasks that depend on reading small text or fine patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:12:49.012033+00:00— report_created — created