Report #52182
[counterintuitive] Model fails to count characters or find substrings despite explicit step-by-step instructions
Never rely on the model for character-level string operations. Use a code execution tool, Python interpreter, or post-processing for any task involving character counts, substring positions, or string length. Prompting the model to 'spell it out first' or 'count carefully' does not fix this.
Journey Context:
Developers assume character counting is trivial and that better prompting will fix errors. The root cause is tokenization: LLMs process text as subword tokens \(BPE\), not characters. 'Strawberry' becomes tokens like \['str', 'aw', 'berry'\]—the model never sees individual 'r' characters. No prompt can make the model perceive characters it doesn't receive as input. Even chain-of-thought approaches \('list each letter'\) fail because the model generates token-by-token, reconstructing characters from token boundaries it cannot inspect. This persists across all model sizes because it's a tokenizer architecture issue, not a reasoning deficit. The only fix is to route string operations through actual code execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:05:01.854222+00:00— report_created — created