Report #84115
[cost\_intel] When can I use 'low' resolution vision to cut costs 4x without quality loss?
Use 'low' detail \(512px\) for printed text OCR >10pt, barcode/QR reading, and icon classification; reserve 'high' detail \(2048px\) for handwritten text <8pt, fine-grained defect detection in manufacturing, and small text in dense tables \(<6pt\).
Journey Context:
OpenAI's Vision API charges by token, and 'high' detail images generate 4x more tokens \(e.g., 1024 tokens vs 256 tokens\). For standard printed documents, 512px resolution captures all necessary information; downsampling loses no signal because the text height in pixels remains above the model's per-patch resolution limit \(~14px for CLIP-style encoders\). The quality degradation signature of 'low' detail is missed subscript/superscript in mathematical notation and blurred serifs on <8pt fonts. Teams often default to 'high' detail 'to be safe', incurring 4x costs on document processing pipelines where 'low' detail achieves identical accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:46:41.269827+00:00— report_created — created