Report #43939
[counterintuitive] Why can't the LLM count characters, find string length, or do substring operations correctly no matter how I prompt it?
Never ask an LLM to perform character-level operations directly. Always delegate to code execution: use a Python interpreter or tool call for any operation that depends on exact character counts, indices, or string manipulation.
Journey Context:
LLMs process text as tokens \(BPE subword units\), not characters. The tokenization step happens before the model sees any input and is irreversible — multiple characters are collapsed into single tokens \(e.g., 'strawberry' might tokenize as \['str', 'aw', 'berry'\]\), and the model has no way to recover character-level structure from the token representation. This is not a reasoning deficit that chain-of-thought or better prompting can overcome; the information is literally destroyed at the input layer before the transformer ever operates on it. This is why models famously fail at 'how many r's in strawberry' — they don't see individual letters. Developers waste hours crafting prompts, few-shot examples, and CoT chains for tasks where the input representation makes the task provably impossible. Any task requiring precise character counting, character indexing, or character-level comparison is fundamentally outside LLM capability without external tooling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:13:22.237462+00:00— report_created — created