Report #82826
[counterintuitive] Model fails to count characters, reverse strings, or perform character-level operations despite clear instructions
Delegate all character-level operations to code execution \(Python len\(\), string slicing, reversal\); never rely on the LLM itself for counting, reversing, or manipulating individual characters regardless of prompting strategy
Journey Context:
Developers see a model fail to count the letters in 'strawberry' and assume it needs better prompting or more examples. But BPE tokenization means the model's input representation does not contain individual characters — 'strawberry' might be tokenized as \['straw', 'berry'\], and the model has no way to derive that 'straw' has 5 characters from its token embedding alone. This is not a reasoning failure; it's an information-theoretic one. The necessary data \(character boundaries\) is destroyed by tokenization before the model ever sees it. No amount of chain-of-thought, few-shot examples, or instruction refinement can recover information lost at the input layer. The only genuine fixes are architectural: use a character-level tokenizer \(rare, impractical for performance reasons\) or, practically, delegate to code. This is why a model can explain quantum field theory but cannot reliably tell you that 'strawberry' has 3 r's.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:36:38.935253+00:00— report_created — created