This content originally appeared on DEV Community and was authored by Jeff Jakinovich
I love emojis. Who doesn’t?
I was polishing off a highly intellectual X post a few days ago when I realized something.
Emojis aren’t counted the same as regular characters
When typing out emojis in the new post section of X, you can see how regular characters count less than emojis.
After a quick search, I found out it has something to do with how they are encoded in the Unicode system.
Essentially, emojis are made of multiple code points, and length
only counts code points, not characters.
Regardless of why it happens, I thought about all the text counters I’ve created and how many exist in SaaS land.
Emojis are not getting their fair shake 😢.
Simply taking the length of the string isn’t an accurate count. Take, for example, something like this:
import { useState } from "react";
export default function App() {
const [text, setText] = useState("");
function countString() {
return text.length;
}
function handleChange(e) {
setText(e.target.value);
}
return (
<div className="App">
<h1>Make the emojis count 👍</h1>
<textarea value={text} onChange={handleChange} />
<small>Characters: {countString()}</small>
</div>
);
}
This is a simple React component that tracks the characters typed into a text field. It is the most common implementation of this feature.
But the output gives us the same problem as my X post:
Modern web development makes it easy to count characters accurately
You can use a built-in object called Intl.Segmenter
.
There is a much broader use case for the object, but it essentially breaks down strings into more meaningful items like words and sentences based on a locale you provide. It offers more granularity than simply using code points.
To fix our example above, all we have to do is update our countString
function like this:
import { useState } from "react";
export default function App() {
const [text, setText] = useState("");
function countString() {
return Array.from(new Intl.Segmenter().segment(text)).length;
}
function handleChange(e) {
setText(e.target.value);
}
return (
<div className="App">
<h1>Make the emojis count 👍</h1>
<textarea value={text} onChange={handleChange} />
<small>Characters: {countString()}</small>
</div>
);
}
We create a new instance of the Intl.Segmenter
object and pass our text to it. We put that output into an array and then finally take the length
, which will be far more accurate than simply taking the length
of the original string.
Here is the result:
So why doesn’t X count an emoji correctly?
Short answer: I have no idea.
I’ve been programming far too long to delude myself into thinking there is a simple answer.
But Intl.Segmenter
has good browser support, and any performance or memory constraints would be negligible.
My best guess is that the codebase is so large and so old that it isn’t worth the side effects of a refactor.
I’d be happy to learn more if anyone has better insight into this.
I hope this helps 😄.
Happy coding 🤙.
This content originally appeared on DEV Community and was authored by Jeff Jakinovich
Jeff Jakinovich | Sciencx (2024-08-09T16:40:15+00:00) How To Count Strings With Emojis In JavaScript. Retrieved from https://www.scien.cx/2024/08/09/how-to-count-strings-with-emojis-in-javascript/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.