Report #4717
[gotcha] JavaScript . matching only UTF-16 code units, splitting emoji and combining marks
Use the u flag and \\p\{...\} property escapes for whole code points, or a grapheme-cluster library such as grapheme-splitter or Intl.Segmenter for user-perceived characters. Do not rely on . or \[^...\] for arbitrary text.
Journey Context:
In JavaScript, . matches one UTF-16 code unit, not a visible character, so /foo.bar/ fails on foo💩bar and /^.$/ fails on emoji. ES6's u flag lets . match a Unicode code point, but even that does not match a grapheme cluster: e\+combining acute or 👨👩👧👦 span multiple code points. For length, slicing, or pattern matching on human text, operate on grapheme clusters, not code units.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:57:41.680969+00:00— report_created — created