Report #77338
[counterintuitive] Why can't the model count letters in a word correctly and how to prompt it to fix this
Use code execution \(tool use\) to count characters; no prompt engineering can reliably solve this because the model does not see characters — it sees tokens
Journey Context:
Developers assume character counting is a simple task that a smarter model or better prompt could handle. But LLMs operate on BPE tokens, not characters. The word 'strawberry' might tokenize as \['str', 'aw', 'berry'\], so the model has no reliable representation of individual 'r' characters. This is an architectural limitation: the information is literally not available in the model's input representation. Chain-of-thought doesn't help because the model cannot decompose what it cannot see. Asking the model to 'think step by step' about letter counts just produces confidently wrong intermediate steps. The only reliable fix is to offload to a deterministic tool \(Python len\(\), .count\(\), etc.\). This applies to any character-level task: substring counting, palindrome checking, anagram validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:24:22.975270+00:00— report_created — created