Swift Tip: Decomposing Emoji

Posted on December 19th, 2017

Now that emoji are common everywhere, we need to be aware of unicode, even without an international userbase. For example, the emoji 👨‍👩‍👧‍👦 (a family of four glyph) has a very different length across String implementations:

								"👨‍👩‍👧‍👦".count // 1
("👨‍👩‍👧‍👦" as NSString).length // 11

Javascript also evaluates to 11. In Ruby "👨‍👩‍👧‍👦".length evaluates to 7, and in Python 2 len("👨‍👩‍👧‍👦") evaluates to 25 (depending on your settings). One string, four different lengths.

Perhaps even more surprising: none of these implementations are wrong. They're all counting different things. In Swift, we get 1 as the answer because Swift counts the characters -- 👨‍👩‍👧‍👦 is a single character. The NSString variant and Javascript evaluate to 11 because they're counting the number of UTF-16 code units. We can replicate this in Swift:

								"👨‍👩‍👧‍👦".utf16.count // 11

We can also see how Python gets to 25 -- in this case, it counts the UTF-8 code units:

								"👨‍👩‍👧‍👦".utf8.count // 25

And finally, Ruby and Python 3 evalute to 7 because they count the unicode scalars, and 👨‍👩‍👧‍👦 consists of the following scalars: 👨 + zero width joiner + 👩 + zero width joiner + 👧 + zero width joiner + 👦.

								"👨‍👩‍👧‍👦".unicodeScalars.count // 7

When you're dealing with strings where length is significant, keep this in mind. To learn more, watch last week's Swift Talk episode or read the transcript. If you'd like to learn more about unicode and how it's implemented in Swift, read our book Advanced Swift.

objc.io

Blog

Swift Tip: Decomposing Emoji

Watch the Full Episode

Swift String vs. NSString

See the Swift Talk Collection

Swift, the Language

Recent Posts

Book Update: Thinking in SwiftUI

Book Update: Thinking in SwiftUI

Thinking in SwiftUI: Live Q&A

Transitions in SwiftUI

Aspect Ratios in SwiftUI