Report #90461
[cost\_intel] Using GPT-4o for user-facing chat when 4o-mini provides identical perceived quality at 20x lower cost and lower latency
Default to GPT-4o-mini for conversational UI; reserve 4o for tasks requiring reasoning, creativity, or complex instruction following where users notice quality differences
Journey Context:
4o-mini streams tokens 2x faster than 4o \(lower TTFB and TPS\). In blind tests, users cannot distinguish mini from 4o on casual chat. The cost delta is $0.15/million vs $2.50/million input tokens. The cliff: when the conversation requires multi-step reasoning \(math, debugging, analysis\), mini hallucinates or loops where 4o succeeds. Instrument your logs for error rates by task type.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:25:57.126732+00:00— report_created — created