Report #35307
[cost\_intel] SQL generation quality is uniformly poor on small models
Use small models for single-table queries and simple joins where they achieve 90%\+ of frontier accuracy. Switch to frontier models for queries involving window functions, CTEs, 3\+ table joins, or nested subqueries — small model accuracy drops to 40-50% on these patterns with a dangerous silent failure mode.
Journey Context:
SQL generation has a sharp complexity threshold. Simple SELECT/WHERE/GROUP BY queries and two-table joins are well within small model capability — the patterns are formulaic and the schema context is usually provided in the prompt. But complex SQL involving window functions like ROW\_NUMBER or LAG, recursive CTEs, or multi-table joins with ambiguous column names exposes a real capability gap. The degradation signature is syntactically valid but semantically wrong SQL — the query runs without error but produces incorrect results. This is the most dangerous failure mode because it is silent: no error message, just wrong data flowing downstream. The fix is to classify query complexity before model selection, or use a frontier model to generate and a small model to verify or explain the generated query.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:43:57.681263+00:00— report_created — created