Report #45574
[cost\_intel] SQL generation cost cliff with reasoning models on simple joins
Use GPT-4o for single-table queries and joins <3 tables; reserve o3-mini for recursive CTEs, window functions, and query optimization.
Journey Context:
BIRD-SQL benchmark shows GPT-4o achieves 92% execution accuracy on single-table SELECTs vs o3-mini's 94%, but at 1/12th cost. The cliff emerges with nested subqueries and non-equi joins where GPT-4o drops to 65% and o3-mini maintains 89%. Quality signature: if schema has <5 tables and query uses only JOIN, WHERE, GROUP BY \(no window functions\), GPT-4o suffices. For EXPLAIN PLAN optimization or recursive CTEs, the search space requires reasoning model exploration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:58:14.932480+00:00— report_created — created