Report #24810
[cost\_intel] High-detail vision mode consuming 10x tokens vs low-detail unnecessarily
Default to 'low' detail for UI screenshots \(85 tokens\); use 'high' detail only for text-heavy documents; resize images to <512px shortest side before sending to minimize tile count
Journey Context:
OpenAI vision pricing: low detail = 85 tokens flat. High detail tiles images into 512x512 squares at 170 tokens per tile. A 1920x1080 screenshot = 10 tiles = 1700\+ tokens vs 85. Common mistake: Using 'auto' detail which selects high for large images, or defaulting to high for all screenshots. Alternative: Pre-process images to 512px width to force single-tile pricing, or use low detail for all non-OCR tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:03:19.962310+00:00— report_created — created