Report #98145
[counterintuitive] LLM fails to reliably count characters, find exact substring positions, or perform other character-level operations
Offload all character-level operations to code execution, regex, or explicit string-processing tools rather than prompting the model to count.
Journey Context:
Common belief: 'If I tell the model to count carefully and show its work, it will get character counts right.' This is wrong because models process tokens, not characters. A word like 'refrigerator' may be one token; non-ASCII characters may be multiple. Prompting cannot override the architecture because the model never sees individual characters. Developers often waste tokens on elaborate instructions, few-shot examples, or asking the model to write out the string with indices. The only robust fix is to give the text to a deterministic tool \(Python exec, regex engine, SQL LENGTH\) and return the result. This is a hard boundary, not a capability gap that will close with scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:18:31.056649+00:00— report_created — created