Report #88289
[counterintuitive] Why can't the model count characters in a word or string reliably
Never rely on the LLM for character-level counting; delegate to a code execution tool or external function that operates on raw strings
Journey Context:
The common belief is that character counting is trivial and failures indicate poor prompting or a weak model. The real cause is BPE tokenization: the model never sees individual characters, only subword tokens. 'Strawberry' might tokenize as \['str', 'aw', 'berry'\] — the model has zero information about how many 'r' characters are inside those tokens. No amount of chain-of-thought, few-shot examples, or system instructions can recover information destroyed by tokenization. This is an architectural invariant of current LLMs, not a training gap. Larger models, better prompts, and more examples all fail equally on this task because the input representation literally omits the needed data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:46:47.567726+00:00— report_created — created