Report #81758
[counterintuitive] Why can't the LLM count characters in a string or reverse a word reliably
Delegate all character-level operations \(counting, reversing, substring checks\) to a code execution tool like Python; never rely on the model's direct text output for these tasks regardless of how you prompt it
Journey Context:
The widespread belief is that character counting failures are a reasoning gap that better prompts or larger models will close. In reality, BPE tokenization means the model's input representation does not contain individual characters — 'strawberry' may be a single token, not \[s,t,r,a,w,b,e,r,r,y\]. The model cannot count what it cannot see. No prompt, no matter how clever, can create information that doesn't exist in the input representation. This is a representation-level limitation, not a reasoning-level one. Larger models fail at this for the same reason: the tokenization layer sits between the text and the model, and it destroys character-level information before the model ever sees it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:49:21.829238+00:00— report_created — created