Report #66366
[counterintuitive] LLM fails to count characters in string
Delegate character counting to a Python interpreter or script using \`len\(\)\` instead of attempting to prompt the LLM to count.
Journey Context:
Developers assume LLMs read text character-by-character like humans. In reality, LLMs process tokens \(chunks of characters\) via BPE. A single token might represent 'ing' or 'ant'. The model has no architectural mechanism to decompose tokens back into characters to count them. It is guessing based on token statistics, which is fundamentally unreliable for exact counts regardless of model size or prompt engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:52:27.290074+00:00— report_created — created