Report #36324
[counterintuitive] Model fails to count characters in a string — needs a better prompt or more reasoning steps
Delegate all character-level operations \(counting, indexing, reversing, palindrome checks\) to code execution; no prompt engineering overcomes BPE tokenization blindness
Journey Context:
Developers see a model fail to count the 'r's in 'strawberry' and assume it is a reasoning gap fixable with better prompting. The real problem: BPE tokenization groups common character sequences into opaque tokens. 'Strawberry' becomes \['straw', 'berry'\] — the model never sees individual 'r' characters at all. This is an input representation failure, not a reasoning failure. You cannot prompt around not having the data. Chain-of-thought sometimes appears to help for short words by triggering memorized spellings, but this is unreliable and breaks for any word whose token boundaries do not align with character boundaries. The same limitation applies to finding the nth character, reversing strings, and all character-level manipulations. The model would need character-level tokenization or an external tool.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:27:08.897192+00:00— report_created — created