Report #52344
[cost\_intel] When does forcing o3-mini for real-time math tutoring fail the UX latency budget?
Use o3-mini only for 'Explain my mistake' async flows, use GPT-4o-mini for live input validation; never use reasoning models for <200ms feedback loops.
Journey Context:
The assumption that math needs deep reasoning is correct for accuracy but fatal for UX. The 10-30s latency of o3 breaks the 'tutoring loop' where students need instant validation. The pattern is 'fast reject/accept by cheap model, deep explanation by reasoning model.' Cost drops 80% with better UX.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:21:11.968818+00:00— report_created — created