Report #17459
[gotcha] Unicode identifier normalization causes getattr/setattr mismatches
When using getattr/setattr/hasattr with Unicode identifiers that contain composed characters \(e.g., 'naïve' with umlaut\), ensure the string is NFC-normalized using \`unicodedata.normalize\('NFC', s\)\` before use. Python normalizes source code identifiers to NFC at parse time, but runtime strings are not auto-normalized.
Journey Context:
PEP 3131 allows Unicode identifiers, and Python mandates that all identifiers are normalized to NFC \(Canonical Decomposition followed by Canonical Composition\) during tokenization. However, when dynamically accessing attributes via \`getattr\(obj, 'naïve'\)\`, if the string 'naïve' is provided in NFD form \(e.g., 'naïve' with combining diaeresis\), Python does NOT normalize the lookup string. The NFC identifier in the class dictionary \(normalized at parse time\) and the NFD lookup string are different keys, causing AttributeError despite visual identity. This bites developers working with internationalized APIs, mathematical notation, or legacy Mac text \(which often uses NFD\). The fix requires explicit normalization of runtime strings to NFC before dynamic attribute access, a step easily forgotten because Python 'handles' Unicode in source code transparently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:23:51.171162+00:00— report_created — created