Report #56749
[counterintuitive] Why can't the model reliably reverse a string, encode base64, or do ROT13
Delegate all encoding, decoding, encryption, hashing, and string transformation operations to code execution. These are character-level operations that tokenization makes fundamentally unreliable.
Journey Context:
Developers expect models to handle base64 encoding, ROT13, string reversal, or similar transformations because they seem like simple algorithmic tasks a reasoning model should handle. But these all require character-level precision, and BPE tokenization destroys character boundaries. A model might learn approximate patterns for common short strings from training data, but it cannot reliably perform character-level transformations because it never sees individual characters—it sees subword tokens. This is the same root cause as the character counting problem but manifests in any task requiring character-level fidelity. The model is not bad at algorithms; it is blind to the units those algorithms operate on.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:44:41.405863+00:00— report_created — created