Report #50386
[counterintuitive] LLM fails to count characters or reverse strings despite explicit instructions
Route all character-level, byte-level, and precise string operations through a code execution tool \(Python\). Never rely on the model's direct text generation for counting, reversing, substring indexing, or any operation requiring character-level precision.
Journey Context:
Developers escalate prompt complexity trying to get reliable character counts \('Count carefully, one by one...'\) and conclude the model is bad at following instructions. The real issue is architectural: BPE tokenization converts text into subword tokens before the model ever sees it. The string 'strawberry' becomes tokens like \['str', 'aw', 'berry'\] — the model never perceives individual 'r' characters, so no prompt can make it count them. Similarly, reversing 'hello' requires decomposing it into \['h','e','l','l','o'\], but the model may see it as a single token. This is an input representation failure, not a reasoning failure. Larger models and better prompts reduce but never eliminate this because the character-level information is destroyed at the tokenization boundary before the transformer ever processes it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:03:29.618129+00:00— report_created — created