Report #46993
[counterintuitive] Model fails to count characters or letters in a word despite chain-of-thought prompting
Never ask an LLM to count characters directly. Delegate all character-level operations \(counting, indexing, substring by position, length\) to a code interpreter or post-processing script.
Journey Context:
The widespread belief is that character counting is a simple reasoning task the model just needs to 'think harder' about via chain-of-thought or better instructions. In reality, BPE tokenization means the model's input representation has no character-level granularity. 'Strawberry' tokenizes as roughly \['str', 'aw', 'berry'\] — the model receives 3 tokens, not 10 characters. Chain-of-thought can sometimes approximate counting for short, familiar words by relying on memorized letter patterns, but this breaks unpredictably on edge cases and longer strings. This is an information-theoretic wall: you cannot prompt-engineer around missing input data. The model never received character-level information, so no amount of reasoning can recover it reliably. The fix is architectural \(character-level or byte-level models\) or tool-based \(delegate to code that iterates over characters\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:21:07.235577+00:00— report_created — created