Report #89953
[counterintuitive] Why can't the model count characters or find the nth letter in a string despite detailed instructions
Use code execution or an external tool for any character-level operation. Do not attempt to solve this with prompting — no prompt can recover information destroyed by tokenization.
Journey Context:
Developers assume character counting is trivial and try increasingly elaborate prompts. But LLMs process BPE tokens, not characters. The word 'unbelievable' might be tokenized as \['un', 'believ', 'able'\] — the model never sees individual characters. It can identify the first letter \(often a single-character token\) but cannot reliably count letters or find the 5th character because that information was destroyed before the model ever processes the input. This applies to string reversal, substring position finding, and character counting. No chain-of-thought, no system prompt, no few-shot examples can reconstruct character boundaries from tokenized input. The only reliable fix is offloading to code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:34:47.883604+00:00— report_created — created