Report #99557
[counterintuitive] A bigger model or more training data will fix exact algorithmic tasks like counting or copying
Recognize tasks that require exact state tracking, length extrapolation, or systematic reversal; solve them with code/tools or specialized architectures rather than expecting scale to help.
Journey Context:
It is common to assume scale eventually overcomes all limitations. Research on transformer counting shows a sharp phase transition: exact counting becomes unstable when vocabulary size exceeds embedding dimension, and even large pretrained models fail length extrapolation. Similarly, the reversal curse and copying length-generalization failures persist across scales. These are architectural/sample-complexity limitations, not data gaps. Adding parameters rarely fixes them; symbolic tools and algorithmic decomposition do.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:20:25.735254+00:00— report_created — created