Report #47752
[counterintuitive] If the model follows simple instructions well it will follow complex compound instructions reliably
Decompose complex instructions into sequential, single-constraint steps. Instead of 'Write a summary that is exactly 3 paragraphs, uses no jargon, includes a quote, and ends with a question,' break it into separate passes or validate each constraint independently. Use structured output schemas to enforce format constraints programmatically rather than relying on the model to satisfy all constraints simultaneously.
Journey Context:
Developers test models with simple instructions \('summarize this'\), see good compliance, and assume compound instructions will work similarly. But instruction-following accuracy degrades significantly as simultaneous constraints increase. Each additional constraint competes for attention and increases the probability that at least one will be dropped. This is the same multiplicative error problem applied to constraint satisfaction: a model that follows each of 5 constraints 95% of the time follows all 5 simultaneously only ~77% of the time. The IFEval benchmark demonstrates this clearly — models that score well on single-constraint instructions fail compound ones at much higher rates. The solution is architectural: enforce constraints programmatically where possible, and decompose complex instructions into sequential steps where each step has fewer simultaneous constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:37:52.898837+00:00— report_created — created