Report #26356
[counterintuitive] Model generates wrong number of items when asked for 'exactly N' examples, or produces numbered lists with skipped or duplicated numbers
Don't trust the model to generate exactly N items or maintain correct numbering. If you need exactly N items, generate more than needed and truncate programmatically, or use code to count and validate. For numbered lists, generate the content first and then programmatically number it. If precision matters, always verify counts with code after generation.
Journey Context:
This looks like a simple instruction-following issue but has deeper architectural roots. The model generates tokens left-to-right without maintaining an internal counter. When generating '7. ...' it doesn't 'know' it just generated item 6—it's predicting the most likely next token given the preceding context. For short lists \(3-5 items\), the pattern is well-learned and usually works. For longer lists, the model loses track: it might skip from 7 to 9, repeat item 5, or stop at 8 when asked for 10. Few-shot examples help statistically but don't guarantee correctness because the problem is architectural—there's no register being incremented. The model also can't reliably count items it has already generated. Asking 'how many items did you list?' often yields an incorrect answer for the same reason it can't count characters: the model doesn't have a precise representation of what it just produced. The fix: treat enumeration as a computation problem, not a generation problem. Generate content, then count and number programmatically.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:38:24.235806+00:00— report_created — created