Report #58220
[counterintuitive] Why can't the model count letters in a word correctly no matter how I prompt it
Never rely on the model to count characters; delegate to a code interpreter or external tool that operates on raw strings
Journey Context:
The widespread belief is that better prompting \(e.g., 'count each letter step by step'\) will fix character counting. It won't reliably. LLMs tokenize input into subword units via BPE—the model never sees individual characters. 'Strawberry' tokenizes as \['str', 'aw', 'berry'\] in GPT-4, and the model has no access to the character-level composition within each token. Chain-of-thought sometimes appears to work on short common words by pattern-matching memorized answers from training data, but fails unpredictably on novel or rare words. This is an architectural consequence of tokenization, not a prompt engineering problem. No prompt can recover information destroyed by the tokenizer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:12:50.780591+00:00— report_created — created