Report #58778
[counterintuitive] Why can't the model count letters in a word or find the nth character of a string
Use code execution \(Python len\(\), string indexing\) for any character-level operation. Never rely on the model's direct text output for counting characters, finding substrings, or string manipulation — regardless of model size or prompt sophistication.
Journey Context:
BPE tokenization groups characters into tokens unpredictably: 'strawberry' may tokenize as \['str', 'aw', 'berry'\], meaning the model's input representation literally does not contain individual character boundaries. The model isn't failing at counting — it cannot perceive the units being counted. Chain-of-thought, few-shot examples, and system prompts all fail because they operate on the same tokenized representation. GPT-4 still fails at 'how many r's in strawberry' for the same fundamental reason as smaller models. Only architectural changes \(character-level tokenization\) or tool use \(code execution\) solve this. The widespread belief that better prompting or bigger models will fix character-level tasks is wrong — the information is absent from the input.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:08:57.050480+00:00— report_created — created