Gotchas when converting strings to arrays in JS

This is a response to @antoomartini
‘s awesome article here, where she describes 4 ways to turn a string into an array:

4 ways to convert a string to an array in Javascript
Maria Antonella ? ・ Aug 18 ・ 1 …


This content originally appeared on DEV Community and was authored by lionel-rowe

This is a response to @antoomartini 's awesome article here, where she describes 4 ways to turn a string into an array:

However, not all of the 4 ways work in the same way. We can see this when we try to use a string such as '?', rather than a Latin-alphabet string:

const str = '?'

str.split('') // ["\ud83d", "\udca9"]

;[...str] // ["?"]

Array.from(str) // ["?"]

Object.assign([], str) // ["\ud83d", "\udca9"]

Why the difference?

To understand the difference, let's take a look at how each way works in turn.

String#split

String#split matches and splits on 16-bit units, as encoded in UTF-16, the internal string representation that JavaScript uses.

You can find what these units are by using string index notation, and you can count them using String#length:

'ab'[0] // "a"
'ab'[1] // "b"
'ab'.length // 2

'?'[0] // "\ud83d"
'?'[1] // "\udca9"
'?'.length // 2

As you can see, something weird is going here. That's because emojis, and various other characters, take up two 16-bit units (for a total of 32 bits) instead of just one.

Object.assign

How does Object.assign work?

The Object.assign() method copies all enumerable own properties from one or more source objects to a target object. It returns the modified target object. (Source: MDN)

In this case, source is '?', and target is []. Object.assign therefore assigns '?''s property 0 to the array's property 0 and '?''s property 1 to the array's property 1. As a result, we get the same result as with String#split.

[...spread]

The spread operator (...) was introduced in ES6. With the introduction of ES6 features, JavaScript started getting smarter with its Unicode handling.

Instead of assigning properties, the spread operator instead iterates over its operand — in this case, our string. String iteration is done based on Unicode codepoints, rather than individual 16-bit units. Our friendly poop emoji is only a single Unicode codepoint, so we get the result we want.

Array.from

As with spread notation, Array.from was introduced in ES6. It iterates over the argument passed to it, so again, we get the expected result.

Caveats

Array.from and spread notation work great for Unicode codepoints, but they still won't cover every situation. Sometimes, what looks like a single glyph is actually multiple Unicode codepoints. For example:

const str1 = ''
const str2 = str1.normalize('NFD')
// "lǜ", looks exactly the same, but composed with combining diacritics

;[...str1] // ["l", "ǜ"]
;[...str2] // ["l", "u", "̈", "̀"]

Or, for another emoji-based example:

const emoji = '??‍?'

;[...emoji] // ["?", "?", "‍", "?"]

Here, it's because the emoji is actually composed of 4 Unicode codepoints, representing woman, skin tone 6, zero-width joiner, and computer respectively.

Further reading

For a much deeper dive, I highly recommend Matthias Bynens's excellent article JavaScript has a Unicode problem.

Thanks for reading! What are your favorite Unicode tips and tricks or JavaScript Unicode gotchas?


This content originally appeared on DEV Community and was authored by lionel-rowe


Print Share Comment Cite Upload Translate Updates
APA

lionel-rowe | Sciencx (2021-08-18T19:44:47+00:00) Gotchas when converting strings to arrays in JS. Retrieved from https://www.scien.cx/2021/08/18/gotchas-when-converting-strings-to-arrays-in-js/

MLA
" » Gotchas when converting strings to arrays in JS." lionel-rowe | Sciencx - Wednesday August 18, 2021, https://www.scien.cx/2021/08/18/gotchas-when-converting-strings-to-arrays-in-js/
HARVARD
lionel-rowe | Sciencx Wednesday August 18, 2021 » Gotchas when converting strings to arrays in JS., viewed ,<https://www.scien.cx/2021/08/18/gotchas-when-converting-strings-to-arrays-in-js/>
VANCOUVER
lionel-rowe | Sciencx - » Gotchas when converting strings to arrays in JS. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/08/18/gotchas-when-converting-strings-to-arrays-in-js/
CHICAGO
" » Gotchas when converting strings to arrays in JS." lionel-rowe | Sciencx - Accessed . https://www.scien.cx/2021/08/18/gotchas-when-converting-strings-to-arrays-in-js/
IEEE
" » Gotchas when converting strings to arrays in JS." lionel-rowe | Sciencx [Online]. Available: https://www.scien.cx/2021/08/18/gotchas-when-converting-strings-to-arrays-in-js/. [Accessed: ]
rf:citation
» Gotchas when converting strings to arrays in JS | lionel-rowe | Sciencx | https://www.scien.cx/2021/08/18/gotchas-when-converting-strings-to-arrays-in-js/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.