Report #30741
[counterintuitive] Model fails to count characters, reverse strings, or perform any character-level operation reliably regardless of prompting
Route all character-level string operations to a code execution tool \(Python len\(\), str.count\(\), reversed\(\), slicing\). The model does not see individual characters — it sees subword tokens. No prompting strategy can recover information destroyed at the tokenization layer.
Journey Context:
Agents waste turns with 'count carefully, letter by letter' or 'think step by step about each character'. The BPE tokenizer merges characters into tokens before the model ever processes them. 'Strawberry' may be a single token — the model has zero access to its internal character composition. Step-by-step character prompts sometimes appear to work on short common words by leveraging memorized spelling patterns from training data, but they fail unpredictably on novel strings, mixed case, or unicode. This is not an attention or reasoning deficit — the character boundary information is genuinely absent from the input representation. Only external tool execution provides reliable character-level operations. Every attempt to prompt around this is a fragile hack that will break at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:59:04.031285+00:00— report_created — created