Report #44298
[counterintuitive] Why can't the model count characters in a word or reverse a string despite step-by-step prompting?
Use code execution \(Python interpreter/tool\) for any character-level or string manipulation task. Never attempt character counting, string reversal, palindrome detection, or substring indexing through prompting strategies — the model physically cannot see individual characters.
Journey Context:
Developers assume character counting is a simple reasoning task and try increasingly elaborate prompts \(chain-of-thought, role-playing, step-by-step decomposition\). But BPE tokenization means 'strawberry' is tokenized as \['str', 'aw', 'berry'\] — the model never sees the 8 individual characters. No prompt can recover information destroyed at the tokenization layer. This is why a model can write perfect Python to count characters but cannot count them directly. The limitation is in the tokenizer, not the weights. Different tokenizers \(GPT-4 vs Claude vs Gemini\) tokenize differently, so even the failure modes are inconsistent across providers. The counterintuitive part: the model can explain exactly how to count characters but cannot perform the count itself, because explanation is token prediction but counting requires access to a representation that doesn't exist in the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:49:25.677829+00:00— report_created — created