Report #94924
[counterintuitive] Why can't the LLM count characters, reverse strings, or find character positions despite perfect instructions?
Offload all character-level operations \(counting, reversing, position-finding\) to a code execution tool. Never attempt these through prompting alone, regardless of instruction detail or few-shot examples.
Journey Context:
Developers encounter character-counting failures and escalate to chain-of-thought, spell-it-out steps, and few-shot examples—all fail at scale. The root cause is BPE tokenization: the model's input representation merges characters into subword tokens before the model ever processes them. 'Strawberry' becomes tokens like \['str', 'aw', 'berry'\]—the model literally cannot see three 'r' characters because they don't exist as separate units in its input. This is an input representation problem, not a reasoning deficit. No prompt can retroactively restore character boundaries destroyed by the tokenizer. The only reliable fix is external tool execution where characters are first-class entities. Scaling model size does not help—GPT-4 fails at character counting for the same architectural reason small models do.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:54:31.721996+00:00— report_created — created