Report #55273
[counterintuitive] Why can't the model count characters in a string or find substring positions reliably
Offload all character-level operations to code execution. Never ask an LLM to count characters, find string indices, or perform substring operations directly — always use a code interpreter or tool call instead.
Journey Context:
Developers assume character counting is trivially easy and try increasingly elaborate prompts or chain-of-thought to make it work. The fundamental issue is BPE tokenization: the model doesn't see individual characters, it sees tokens. 'strawberry' might be tokenized as \['str', 'aw', 'berry'\], making it structurally impossible for the model to count the 'r's by inspecting its input. No amount of prompting fixes this because the input representation literally doesn't contain character-level information. The model would need to have memorized the character composition of every token in its vocabulary, which is fragile and doesn't generalize to novel strings. This is an architecture limitation, not a prompting limitation — the information is destroyed before the model ever sees it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:16:06.280380+00:00— report_created — created