Dart runes
last modified May 29, 2026
In this article we show how to work with runes in Dart.
A string is a sequence of UTF-16 code units. Strings represent some text in a program. A character may be represented by multiple code points. Each code point can have one or two code units. A code point is a numerical offset in a character set. Each code point is a number whose meaning is given by the Unicode standard.
A grapheme is the smallest unit of a writing system of any given language. An individual grapheme may or may not carry meaning by itself, and may or may not correspond to a single phoneme of the spoken language. The term character has been used to represent a single character in the original ASCII table. This table, however, can represent a limited set of characters. Outside of the ASCII table it is better to use the term grapheme instead of the term character.
A rune is an integer representing a Unicode code point. The runes
property of a string returns its runes. The term was borrowed from Go. The
runes property of a string returns the Unicode code points of a
string. To express a Unicode code point, the \uXXXX syntax is used,
where XXXX is a 4-digit hexadecimal value. If a Unicode code point requires more
than 4 digits, we place the value in curly brackets.
The bytes are the actual information stored for the string contents. Each code point can require one or more bytes of storage depending on the Unicode standard being used (UTF-8, UTF-16, etc.).
The characters package contains functions for more advanced
manipulation of Unicode graphemes.
The three concepts relate to the same string but count different things:
| Concept | Dart API | Value for ๐ (U+1F418) |
Description |
|---|---|---|---|
| Code unit | s.codeUnits |
[55357, 56344] โ 2 units |
Raw UTF-16 storage unit (16 bits) |
| Code point / rune | s.runes |
[128024] โ 1 point |
Unicode scalar value (U+1F418) |
| Grapheme cluster | s.characters |
['๐'] โ 1 cluster |
What a human perceives as one character |
Most Unicode bugs arise from working at the wrong level โ using
codeUnits to count characters, or runes to count
emoji with skin tone modifiers. The sections below show each level in
practice.
UTF-8 and UTF-16
Dart stores strings internally as UTF-16. Code points in the Basic Multilingual Plane (U+0000โU+FFFF) occupy one code unit; supplementary code points (U+10000โU+10FFFF) require two code units called a surrogate pair. For I/O โ files, the network, and the Dart VM itself โ Dart converts strings to UTF-8, which uses 1 to 4 bytes per code point.
import 'dart:convert';
void main() {
const elephant = '\u{1F418}'; // U+1F418 ๐ โ supplementary code point
// UTF-16 storage: supplementary code points use a surrogate pair
print(elephant.codeUnits); // [55357, 56344]
print(elephant.codeUnits.length); // 2 code units
print(elephant.runes.length); // 1 code point
// UTF-8 I/O: 4 bytes for code points in U+10000โU+10FFFF
print(utf8.encode(elephant)); // [240, 159, 144, 152]
print(utf8.encode(elephant).length); // 4 bytes
// ASCII: 1 code unit = 1 code point = 1 byte in both encodings
print('A'.codeUnits); // [65]
print(utf8.encode('A')); // [65]
// Latin extended รฉ (U+00E9): 1 code unit, but 2 UTF-8 bytes
print('\u00E9'.codeUnits); // [233]
print(utf8.encode('\u00E9')); // [195, 169]
}
The surrogate pair values (55357, 56344) are internal UTF-16 bookkeeping and
do not appear in UTF-8 output. A string of five emoji has
codeUnits.length of 10, runes.length of 5, and a
UTF-8 byte count of up to 20 bytes โ three different numbers for the same
string.
$ dart main.dart [55357, 56344] 2 1 [240, 159, 144, 152] 4 [65] [65] [233] [195, 169]
Dart runes simple example
The following simple example works with runes.
void main() {
final msg = 'an old falcon';
print(msg.codeUnits);
for (final rune in msg.runes) {
print(rune);
}
}
The codeUnits property returns a list of UTF-16 code units.
The runes property provides an iterable over the Unicode code
points (runes) of the string.
$ dart main.dart [97, 110, 32, 111, 108, 100, 32, 102, 97, 108, 99, 111, 110] 97 110 32 111 108 100 32 102 97 108 99 111 110
Dart String.fromCharCode
The String.fromCharCode turns a code point into a grapheme.
import 'dart:io';
void main() {
final msg = 'one \u{1F418} and three \u{1F40B}';
for (final rune in msg.runes) {
stdout.write('${String.fromCharCode(rune)} ');
}
stdout.writeln();
}
In the example, we go through the string runes. We use the
String.fromCharCode member function to transform each rune into a
grapheme.
$ dart main.dart o n e ๐ a n d t h r e e ๐
Dart emojis
The following example displays four emojis.
void main() {
final c1 = '\u{1F9F6}';
final c2 = '\u{1FA86}';
final c3 = '\u26C4';
final c4 = '\u{1F37A}';
print(c1);
print(c2);
print(c3);
print(c4);
print(c3.codeUnits);
print(c4.codeUnits);
}
The emojis are expressed using the special syntax with hexadecimal values.
$ dart emojis.dart ๐งถ ๐ช โ ๐บ [9924] [55356, 57210]
Note that the fourth emoji uses two code points.
Dart runes length
With the length property, we can determine the length of a string
or the corresponding runes attribute.
void main() {
final msg = 'one ๐ and three ๐';
print(msg.length);
print(msg.runes.length);
}
We have a string consisting of ASCII characters and two emojis.
print(msg.length); print(msg.runes.length);
We access the length property of the string object and its
runes attribute.
$ dart main.dart 19 17
The runes.length gives the correct answer; there are 17 graphemes
in the string.
In the next example, we count the graphemes of different writing systems.
void main() {
final msg1 = "falcon";
final msg2 = "ะฒะธัะฝั";
final msg3 = "๐บ๐ฆ๐ฆ";
final msg4 = "เคจเคฎเคธเฅเคคเฅ";
print(msg1.length);
print(msg2.runes.length);
print(msg3.runes.length);
print(msg4.runes.length);
}
In the program, we count the number of graphemes in ASCII, Cyrillic, and Sanskrit and emojis.
$ dart main.dart 6 5 3 6
The example gives correct results for all except for the Sanskrit. For more complex examples, we need to use the characters package.
Combining characters
Unicode allows the same visible character to be stored in multiple ways. The letter รฉ can be encoded as a single precomposed code point (NFC, U+00E9) or as the base letter e followed by a combining acute accent (NFD, U+0065 + U+0301). Both render identically but are distinct Dart strings. Dart does not normalize strings automatically.
import 'package:characters/characters.dart';
void main() {
final precomposed = '\u00E9'; // NFC: รฉ as a single code point U+00E9
final decomposed = 'e\u0301'; // NFD: e + combining acute accent U+0301
print('precomposed runes: ${precomposed.runes.length}'); // 1
print('decomposed runes: ${decomposed.runes.length}'); // 2
// They look identical but are NOT equal as Dart strings
print(precomposed == decomposed); // false
// The characters package treats both as exactly 1 grapheme cluster
print(precomposed.characters.length); // 1
print(decomposed.characters.length); // 1
}
If you receive text from different sources โ user input, an API, a file โ the same visible character may arrive in NFC or NFD form and will silently fail equality checks. Normalize all strings to the same canonical form before comparing them in production code.
$ dart main.dart precomposed runes: 1 decomposed runes: 2 false 1 1
Flag emojis
Country flag emojis are composed of two Regional Indicator Symbol Letters (RISL, U+1F1E6โU+1F1FF) corresponding to the ISO 3166-1 two-letter country code. Because each RISL is a supplementary code point, a flag emoji always consists of 2 code points and 4 UTF-16 code units, yet forms a single grapheme cluster.
import 'package:characters/characters.dart';
void main() {
final sk = '\u{1F1F8}\u{1F1F0}'; // RISL S + RISL K = ๐ธ๐ฐ
final us = '\u{1F1FA}\u{1F1F8}'; // RISL U + RISL S = ๐บ๐ธ
print(sk); // ๐ธ๐ฐ
print(sk.runes.toList()); // [127480, 127472]
print(sk.runes.length); // 2 code points
print(sk.codeUnits.length); // 4 code units
print(sk.characters.length); // 1 grapheme cluster
// Build a flag from an ISO 3166-1 country code
int risl(String ch) => 0x1F1E6 + ch.codeUnitAt(0) - 0x41; // 0x41 = 'A'
String makeFlag(String cc) =>
String.fromCharCode(risl(cc[0])) + String.fromCharCode(risl(cc[1]));
print(makeFlag('DE')); // ๐ฉ๐ช
print(makeFlag('JP')); // ๐ฏ๐ต
print(makeFlag('BR')); // ๐ง๐ท
print(us); // ๐บ๐ธ
}
Not all two-letter combinations correspond to a recognized flag; unsupported
codes display as two RISL letters rather than a flag image. The
characters package correctly groups both RISL letters into one
grapheme cluster, so sk.characters.length is 1 even though
sk.runes.length is 2.
$ dart main.dart ๐ธ๐ฐ [127480, 127472] 2 4 1 ๐ฉ๐ช ๐ฏ๐ต ๐ง๐ท ๐บ๐ธ
Skin tone modifiers
Many human emoji support five skin tones via Fitzpatrick modifier code points
(U+1F3FBโU+1F3FF). The modifier immediately follows the base emoji and forms
a single grapheme cluster, but adds one extra code point and two extra UTF-16
code units. Using runes.length to count such emoji gives the
wrong answer; characters.length is always correct.
import 'package:characters/characters.dart';
void main() {
final base = '\u{1F44B}'; // ๐ waving hand, no modifier
final light = '\u{1F44B}\u{1F3FB}'; // ๐๐ป + light skin tone U+1F3FB
final medium = '\u{1F44B}\u{1F3FD}'; // ๐๐ฝ + medium skin tone U+1F3FD
final dark = '\u{1F44B}\u{1F3FF}'; // ๐๐ฟ + dark skin tone U+1F3FF
for (final e in [base, light, medium, dark]) {
print('$e runes: ${e.runes.length} chars: ${e.characters.length}');
}
}
The base emoji has 1 code point; each modified variant has 2 code points
(base + Fitzpatrick modifier). The characters package always
reports 1 grapheme cluster, correctly reflecting what the user sees.
$ dart main.dart ๐ runes: 1 chars: 1 ๐๐ป runes: 2 chars: 1 ๐๐ฝ runes: 2 chars: 1 ๐๐ฟ runes: 2 chars: 1
ZWJ sequences
A Zero Width Joiner (ZWJ, U+200D) fuses adjacent emoji into a new visual unit. Professional emoji and family emoji are the most common examples. A ZWJ sequence chains multiple code points into a single grapheme cluster.
import 'package:characters/characters.dart';
void main() {
// woman (U+1F469) + ZWJ (U+200D) + laptop (U+1F4BB) = ๐ฉโ๐ป
const techie = '\u{1F469}\u200D\u{1F4BB}';
// man + ZWJ + woman + ZWJ + girl + ZWJ + boy = ๐จโ๐ฉโ๐งโ๐ฆ
const family = '\u{1F468}\u200D\u{1F469}\u200D\u{1F467}\u200D\u{1F466}';
print('techie runes: ${techie.runes.length}'); // 3
print('techie characters: ${techie.characters.length}'); // 1
print('family runes: ${family.runes.length}'); // 7
print('family characters: ${family.characters.length}'); // 1
// ZWJ (decimal 8205) is present in the sequence
print('ZWJ present: ${family.runes.contains(0x200D)}'); // true
print(techie); // ๐ฉโ๐ป
print(family); // ๐จโ๐ฉโ๐งโ๐ฆ
}
techie contains 3 code points (woman, ZWJ, laptop) but renders
as one emoji. family contains 7 code points (4 people + 3 ZWJs).
Only characters.length returns 1 in both cases, correctly
reflecting what the user sees as a single character.
$ dart main.dart techie runes: 3 techie characters: 1 family runes: 7 family characters: 1 ZWJ present: true ๐ฉโ๐ป ๐จโ๐ฉโ๐งโ๐ฆ
Dart characters
The characters package supports Unicode (extended) grapheme
clusters.
$ dart pub add characters
We add the package to the project.
import 'package:characters/characters.dart';
void main() {
final m1 = "๐บ๐ฆ๐ฆ";
final m2 = "ะฒะธัะฝั";
final m3 = "เคจเคฎเคธเฅเคคเฅ";
print(m1.characters.length);
print(m2.characters.length);
print(m3.characters.length);
print(m3.characters.first);
print(m3.characters.last);
for (final e in m3.characters) {
print(e);
}
}
In the program, we count the graphemes of emojis and Cyrillic and Sanskrit text.
import 'package:characters/characters.dart';
The package is imported.
print(m1.characters.length); print(m2.characters.length); print(m3.characters.length);
The package gives us the characters attribute.
print(m3.characters.first); print(m3.characters.last);
We get the first and last grapheme of the Sanskrit text.
for (final e in m3.characters) {
print(e);
}
We print all its graphemes.
$ dart main.dart 3 5 4 เคจ เคคเฅ เคจ เคฎ เคธเฅ เคคเฅ
Now we get correct results.
Grapheme slicing pitfalls
String.length, array indexing, and substring all
operate on UTF-16 code units. For strings containing supplementary code points
or multi-code-point graphemes, this silently corrupts the data rather than
throwing an error.
import 'package:characters/characters.dart';
void main() {
// ZWJ sequence: 5 code units, 3 code points, 1 grapheme
final emoji = '\u{1F469}\u200D\u{1F4BB} is coding';
print(emoji.length); // 15 code units
print(emoji.runes.length); // 13 code points
print(emoji.characters.length); // 11 graphemes
// substring() operates on code units โ cuts the surrogate pair of ๐ฉ in half
final broken = emoji.substring(0, 1); // lone high surrogate
print(broken.codeUnits); // [55357] โ meaningless fragment
print(broken == '\uD83D'); // true โ not a valid Unicode scalar
// Safe: grapheme-aware slicing
final safe = emoji.characters.take(1).toString();
print(safe); // ๐ฉโ๐ป โ full grapheme preserved
print(safe.length); // 5 code units, but exactly 1 user-visible character
}
emoji.substring(0, 1) cuts at code unit 1, which falls inside
the surrogate pair for ๐ฉ, returning a lone high surrogate. This is a valid
Dart string object but not a valid Unicode scalar value. In contrast,
emoji.characters.take(1) advances to the end of the first
grapheme cluster regardless of how many code units it spans.
$ dart main.dart 15 13 11 [55357] true ๐ฉโ๐ป 5
Safe grapheme iteration
Iterating runes is correct for text that contains no combining
marks or multi-code-point clusters, but will split complex graphemes such as
skin-tone emoji and Devanagari syllables into their constituent code points.
Iterating characters is always correct.
import 'package:characters/characters.dart';
void main() {
// Mix of ASCII, emoji with skin-tone modifier, and Devanagari
final text = 'Hi \u{1F44B}\u{1F3FD} \u0928\u092E\u0938\u094D\u0924\u0947';
// runes: splits ๐๐ฝ into [๐, ๐ฝ] and Devanagari into 6 code points
final byRune = text.runes.map(String.fromCharCode).toList();
print('By rune (${byRune.length}): $byRune');
// characters: correct grapheme clusters every time
final byChar = text.characters.toList();
print('By char (${byChar.length}): $byChar');
}
The runes loop emits 12 items โ it separates the skin-tone
modifier from its base emoji and decomposes each Devanagari conjunct into
its individual code points. The characters loop emits 9 items,
correctly grouping each user-visible unit into one entry.
$ dart main.dart By rune (12): [H, i, , ๐, ๐ฝ, , เคจ, เคฎ, เคธ, เฅ, เคค, เฅ] By char (9): [H, i, , ๐๐ฝ, , เคจ, เคฎ, เคธเฅ, เคคเฅ]
Unicode-aware string reversal
Reversing a Unicode string is a classic trap. Naive approaches corrupt surrogate pairs or ZWJ sequences. Correct reversal must operate on grapheme clusters.
import 'package:characters/characters.dart';
void main() {
// "cafรฉ ๐ธ๐ฐ"
final s = 'caf\u00E9 \u{1F1F8}\u{1F1F0}';
// Naive: reverses UTF-16 code units โ inverts the surrogate pairs for
// ๐ธ and ๐ฐ, producing a string with invalid lone surrogates.
final naiveRev = s.split('').reversed.join();
print('Naive: $naiveRev'); // garbled โ lone surrogates in wrong positions
// Rune reversal: avoids broken surrogates, but reverses the two RISL
// code points individually, so ๐ธ๐ฐ becomes ๐ฐ๐ธ.
final runeRev = String.fromCharCodes(s.runes.toList().reversed);
print('Rune: $runeRev'); // ๐ฐ๐ธ รฉfac โ flag letters reversed
// Correct: reverse grapheme clusters โ flag preserved as one unit.
final safeRev = s.characters.toList().reversed.join();
print('Safe: $safeRev'); // ๐ธ๐ฐ รฉfac
}
The naive split('').reversed operates on code units, so the
surrogate pairs for each Regional Indicator letter are split across the
boundary. The rune-based reversal avoids broken surrogates but still reverses
the two individual RISL code points, turning ๐ธ๐ฐ into ๐ฐ๐ธ. Only
grapheme-cluster reversal keeps the flag intact.
$ dart main.dart Naive: (garbled โ lone surrogates) Rune: ๐ฐ๐ธ รฉfac Safe: ๐ธ๐ฐ รฉfac
Performance considerations
The three APIs have different cost profiles:
- codeUnits โ near-zero overhead; directly exposes the internal UTF-16 storage with O(1) indexed access. Use it for encoding, hashing, and low-level manipulation where you fully control the character set.
- runes โ lightweight lazy iteration that decodes surrogate pairs on the fly. Correct for any text that does not contain combining marks, skin-tone modifiers, or ZWJ sequences โ which covers most Latin and Cyrillic text.
- characters โ performs full Unicode extended grapheme
cluster segmentation on every call. More CPU-intensive than
runes, but the only safe choice for user-visible text operations (counting, slicing, iterating, reversing) that may involve complex scripts, emoji with modifiers, or ZWJ sequences.
As a rule of thumb: use codeUnits for I/O and encoding, use
runes for simple Unicode text, and use characters
whenever correctness for all human languages matters.
Source
Dart Runes - language reference
In this article we covered Dart runes. We explored code units, code points,
and grapheme clusters, demonstrated UTF-8 vs UTF-16 encoding, and showed
how to handle combining characters, flag emoji, skin tone modifiers, and ZWJ
sequences correctly using the characters package.
Author
List all Dart tutorials.