Report #95129
[counterintuitive] LLM fails to count characters or reverse strings — the prompt must be improved
Offload all character-level string operations \(counting, reversing, substring by index\) to a code execution tool. No prompt engineering can fix this because the model's input representation \(BPE tokens\) discards character boundaries.
Journey Context:
Developers see a model fail 'how many r's in strawberry' and assume it's a reasoning gap that better prompting can close. In reality, BPE tokenization means the model sees 'strawberry' as tokens like \['str', 'aw', 'berry'\] — individual character counts are not available in the input. This is an encoding-level information loss, not a reasoning failure. Chain-of-thought, few-shot examples, and instruction refinement all fail because you cannot reason about information that was destroyed before the model ever saw it. The only fixes are architectural \(character-level tokenization, which has severe tradeoffs in efficiency and language coverage\) or external \(code execution\). This applies to any character-indexed operation: substring extraction by position, character reversal, finding the nth character.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:15:10.906880+00:00— report_created — created