Report #68498
[counterintuitive] Model keeps miscounting characters in a string — need a better prompt
Use code execution or an external tool for any character-level operation \(counting, indexing, reversing\). Never attempt character-level tasks via prompting alone, no matter how detailed the instructions.
Journey Context:
LLMs use BPE tokenization, which destroys character-level information before the model ever sees the input. The word 'strawberry' may tokenize as \['str', 'aw', 'berry'\] — the model has no native representation of individual characters, only subword tokens. No prompt can recover information lost at the tokenizer level. This is why even frontier models fail at 'how many r's in strawberry': it is an information-theoretic gap, not a reasoning gap. The model literally cannot count what it cannot see. Workarounds like 'spell it out letter by letter first' sometimes help by forcing character-by-character generation, but they remain unreliable because the model is still guessing characters from token representations, not reading them. The only reliable fix is architectural \(byte-level or character-level models\) or practical \(tool use / code execution\). Scaling model size does not help — a 10x larger model still uses BPE tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:27:36.675066+00:00— report_created — created