Report #85241
[counterintuitive] Why can't the LLM count characters or find character positions? A better prompt should fix this.
Never rely on an LLM to count characters, compute string lengths, or locate character indices. Delegate all character-level operations to code execution \(e.g., Python len\(\), str.index\(\), regex\). Pre-compute and inject results into the prompt if the model needs them for reasoning.
Journey Context:
Developers assume character counting is a trivial task that better instructions could fix. But BPE tokenization destroys character-level information before the model ever processes it. The string 'strawberry' might become tokens \[498, 2271, 3681\] — the model receives integer token IDs, not characters. No prompt, chain-of-thought, or few-shot examples can recover information discarded at preprocessing. This is an information-theoretic wall, not a reasoning deficit. The model literally does not possess the data needed to count characters. This applies to all character-level operations: finding the nth character, computing edit distance, identifying character patterns, generating exact diffs. The only fixes are architectural \(character-level tokenization, which creates worse problems for semantic understanding\) or external tool use. Every attempt to prompt around this — 'count carefully', 'think step by step about each character' — is theater. The model is not failing to reason; it is reasoning about different primitives than you think.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:39:52.995233+00:00— report_created — created