This content originally appeared on Go Make Things and was authored by Go Make Things
The other day, I wrote about how emoji don’t count as a single character, and some of the challenges with counting them in a string.
I also shared a simple trick for getting the number of emoji. Unfortunately, that trick falls apart pretty quickly with more complex emoji.
Today, I wanted to dig deeper, and share a modern browser API that fixes this problem. Let’s dig in!
The problem
The trick I shared uses a for...of
loop and count
variable to loop through a string and count the emoji in it.
let str = '🍦🎉';
let count = 0;
for (let char of str) {
count++;
}
In this example, count
has a value of 2
.
A few readers shared alternate versions using destructuring or the Array.from()
method to convert the string to an array and check its length
.
let count1 = Array.from(str).length;
let count2 = [...str].length;
With the example str
used, these all work. But some emoji actually count as more than two characters. The facepalm emoji, for example, has a length
of 5
.
let str = '🍦🎉🤦♂️';
let count = [...str].length;
Here, count
has a value of 6
, rather than the expected 3
.
The Intl.Segmenter()
object
The Intl
API exposes a variety of methods that can be used to Internationalize strings and numbers in a browser-native way.
One of the newer objects in the API is the Intl.Segmenter()
object, which can be used to segment strings into characters, words, or sentences in a way that considers international conventions.
It’s primary use is for languages with non-Roman characters (Mandarin, for example). The MDN documentation for the API provides this explanation for how it differs from String.split()
.
If we were to use String.prototype.split(” “) to segment a text in words, we would not get the correct result if the locale of the text does not use whitespaces between words (which is the case for Japanese, Chinese, Thai, Lao, Khmer, Myanmar, etc.).
> const str = "吾輩は猫である。名前はたぬき。"; > console.table(str.split(" ")); > // ['吾輩は猫である。名前はたぬき。'] > // The two sentences are not correctly segmented. > ``` We can use this object to accurately count emoji in a string as well. ## Counting emoji in a string with the `Intl.Segmenter()` object First, we'll create a new `Intl.Segmenter()` object. While the constructor for this can accept a `locale` and object of `options` as arguments, they're both optional and we don't need either for this particular use. ```js // Create a Segmenter object let splitEmoji = new Intl.Segmenter();
Next, we’ll run the Intl.Segmenter.prototype.segment()
method on the splitEmoji
object we created, and pass our string of emoji into it. This creates a Segments
object.
// Create a Segmenter object
let splitEmoji = new Intl.Segmenter();
// Segment the string
let str = '🍦🎉🤦♂️';
let segment = splitEmoji.segment(str);
Finally, we can convert the Segment
object into an array, and get it’s length
property.
// Create a Segmenter object
let splitEmoji = new Intl.Segmenter();
// Segment the string
let str = '🍦🎉🤦♂️';
let segment = splitEmoji.segment(str);
// Get the number of characters
let count = Array.from(segment).length;
Here, count
has a value of 3
.
You can also shorten this into a one-liner if you’d like.
// Get the number of characters
let count = Array.from(new Intl.Segmenter().segment(str)).length;
Browser support
Now for the bad news: the Intl.Segmenter()
method works in all modern browsers… except Firefox.
There are various polyfills out there, but I can’t speak to how good one is over another. Because this API is intended for internationalization, they require you to import specific language dictionaries or configuration files to work.
In the interim, there’s a useful NPM package you might want to check out: graphemer.
This content originally appeared on Go Make Things and was authored by Go Make Things
Go Make Things | Sciencx (2022-04-14T14:30:00+00:00) Emoji are still weird (but modern browser methods help). Retrieved from https://www.scien.cx/2022/04/14/emoji-are-still-weird-but-modern-browser-methods-help/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.