Report #75246
[cost\_intel] Multi-constraint instruction following — how many constraints before small models fall off a quality cliff
Haiku/Flash models handle 1-3 constraints reliably \(>90% compliance\). At 4-6 constraints, compliance drops to 60-75%. At 7\+ constraints, expect <50% full compliance. For multi-constraint tasks, either use Sonnet/GPT-4o \(which maintain >85% compliance up to 8-10 constraints\) or decompose into sequential single-constraint calls to cheaper models.
Journey Context:
The constraint compliance curve is non-linear for small models. Each additional constraint doesn't reduce compliance by a fixed amount — errors compound. A Haiku model asked to 'output JSON, keep under 200 words, use formal tone, avoid these 5 terms, include a summary, cite sources' will reliably satisfy 3-4 of 6 constraints but rarely all 6. The practical workaround: chain calls. First call extracts raw content, second call formats to JSON, third call validates constraints. Three Haiku calls at $0.25/M input each still cost far less than one Opus call at $15/M input. The key diagnostic: if your evaluation shows partial constraint satisfaction \(some constraints met, others ignored\), you've hit the small-model constraint cliff and need to either upgrade or decompose.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:53:40.237124+00:00— report_created — created