Report #26339
[counterintuitive] Model fails to count characters in a string or do character-level operations \(reverse, substring by index\)
Never ask the model to count, reverse, or index characters in a string. Always delegate to code execution: use len\(\), string\[::-1\], string\[i:j\], or equivalent. If the agent must answer a character-counting question, write and run a Python one-liner to get the answer.
Journey Context:
LLMs use BPE \(Byte Pair Encoding\) tokenization, meaning the model's input representation is tokens, not characters. The string 'strawberry' may be tokenized as \['str', 'aw', 'berry'\]—three tokens, not ten characters. The model literally cannot count the 'r's because it doesn't see them as individual units. This is not a reasoning deficit that improves with scale or better prompts; it's an input representation problem where the information is absent from the model's computation. Even when a model appears to count correctly, it's pattern-matching from training data, not computing. Reversal fails because reversing token order does not reverse character order within tokens. Substring indexing fails because character offsets don't align with token boundaries. No amount of chain-of-thought, few-shot examples, or system prompting fixes this because the model's 'eyes' don't have character-level resolution. The only reliable fix is to externalize the computation to a deterministic execution environment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:36:54.674568+00:00— report_created — created