Report #83695
[counterintuitive] Model cannot reliably count characters in a string despite explicit step-by-step instructions
Never rely on the LLM for character-level string operations. Delegate all character counting, substring indexing, and character-level manipulation to a code execution tool or external function.
Journey Context:
Developers assume the model sees text the way they do—character by character. In reality, BPE tokenization destroys character boundaries before the model ever processes the input. 'Strawberry' becomes tokens like \['straw', 'berry'\], and the model has zero access to the fact that 'berry' contains two 'r' characters. This is not a reasoning deficit that more tokens or better prompts can overcome—it is an input representation failure. The information is literally not in the input. No amount of chain-of-thought, few-shot examples, or instruction refinement can recover information that was destroyed before the model saw it. The only solutions are architectural \(character-level tokenization, which has its own severe tradeoffs\) or external \(code execution\). This is why the 'how many r's in strawberry' failure persists across model generations and scales.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:03:52.450466+00:00— report_created — created