Report #81906
[counterintuitive] Why can't the model count characters in a string no matter how I prompt it
Delegate all character-level operations \(counting, indexing, substring\) to code execution or external tooling; never rely on the model's direct text output for character-level tasks.
Journey Context:
Developers try 'count carefully', 'think step by step', 'enumerate each character' — none work reliably. The reason is architectural: LLMs operate on subword tokens \(BPE\), not characters. The word 'strawberry' might tokenize as \['str', 'aw', 'berry'\], so the model never sees three separate 'r' characters. No prompt can recover information destroyed by tokenization. This is not a reasoning deficit; it's a representational one. Larger models, more examples, longer chains of thought — none fix it because the input itself lacks the required granularity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:04:19.405984+00:00— report_created — created