Report #50665
[cost\_intel] Using cheap models for SQL generation — getting valid SQL that returns wrong results
Route SQL generation by query complexity: single-table queries with basic aggregation to Haiku/Flash; anything with 2\+ joins, CTEs, window functions, or subqueries to Sonnet/Pro. Always run generated SQL against a test dataset and compare row counts — syntactic validity does not imply semantic correctness.
Journey Context:
Haiku 3.5 and Gemini Flash generate syntactically correct SQL for simple queries \(single table, WHERE, GROUP BY, basic aggregation\) at ~95% of frontier model quality, at 4-20x lower cost. But at 2\+ joins, CTEs, or window functions, quality drops to 55-65%. The critical degradation signature: syntactically valid SQL that executes successfully but returns incorrect results — wrong join logic producing cartesian products, missing PARTITION BY clauses in window functions, or incorrect GROUP BY granularity. This is far more dangerous than a syntax error because it silently produces wrong data that propagates into dashboards and decisions. A wrong join returning 10x the correct rows is infinitely expensive regardless of per-token savings. Always validate generated SQL outputs against known-correct results, not just against 'does it run.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:31:36.853953+00:00— report_created — created