Report #66007
[cost\_intel] Vision 'auto' detail mode selects high-resolution for small images causing 13x token cost
Force 'detail: low' for all images unless OCR is needed; resize images to 512px on the short side before base64 encoding to guarantee 85 tokens per image instead of 1105\+.
Journey Context:
OpenAI's vision model charges 85 tokens for low-res mode \(512x512\) and 1105 tokens for high-res mode \(1024x1024 with tiles\). The 'auto' setting defaults to high-res if the image exceeds 512px in any dimension. A 800x600 screenshot triggers high-res, costing 13x more \(1105 vs 85 tokens\). Many users assume 'auto' optimizes for cost; it optimizes for quality. The trap is uploading user-generated content \(screenshots, phone photos\) at native resolution. The fix forces 'detail: low' in the API call and preprocesses images to ensure the short side is <=512px. This guarantees the 85-token rate. Only use high-res when fine text OCR is required. This reduces vision API costs by 90% for standard UI automation tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:16:23.677304+00:00— report_created — created