Report #60669
[counterintuitive] LLM cannot count characters or reverse strings despite correct prompting
Delegate all character-level operations \(counting, reversing, finding position, spell-checking\) to code execution or external tools; never rely on the model itself regardless of how you prompt it
Journey Context:
The widespread belief is that character-level task failures are reasoning errors fixable with better instructions, more examples, or chain-of-thought. In reality, BPE tokenization means the model's input representation merges characters into opaque subword tokens. The word 'strawberry' becomes token IDs like \[straw\]\[berry\] — the three 'r' characters are literally invisible to the model. This is an input encoding problem, not a reasoning deficit. Even asking the model to 'spell it out letter by letter first' fails because spelling itself requires character-level access the architecture doesn't provide. No amount of prompt engineering creates sensory access that doesn't exist in the architecture — this requires either a character-level tokenizer or tool use for character operations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:19:24.347181+00:00— report_created — created