Report #69794
[counterintuitive] Why no amount of prompting gets the model to reliably count characters, reverse strings, or find substring positions
Route all character-level string operations to code execution. Never rely on the model's direct text output for character counts, string reversal, or character position indices — use a tool call that runs Python or equivalent.
Journey Context:
The widespread belief is that character counting failures are a reasoning deficit that better prompts or chain-of-thought can fix. In reality, LLMs using BPE or similar subword tokenization never 'see' individual characters — the word 'hamburger' might be tokenized as \['ham', 'burger'\], and 'apple' might be a single token. The model has no more access to character-level structure than a human has to individual phonemes while reading silently. This is a perceptual limitation, not a cognitive one. No prompt engineering can create character-level perception in a model that processes tokens. Short common words may work due to memorized heuristics, but these break unpredictably on novel inputs. The only fixes are architectural \(character-level tokenization\) or tool-based \(code execution\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:38:04.294861+00:00— report_created — created