Dart runes

last modified May 29, 2026

In this article we show how to work with runes in Dart.

A string is a sequence of UTF-16 code units. Strings represent some text in a program. A character may be represented by multiple code points. Each code point can have one or two code units. A code point is a numerical offset in a character set. Each code point is a number whose meaning is given by the Unicode standard.

A grapheme is the smallest unit of a writing system of any given language. An individual grapheme may or may not carry meaning by itself, and may or may not correspond to a single phoneme of the spoken language. The term character has been used to represent a single character in the original ASCII table. This table, however, can represent a limited set of characters. Outside of the ASCII table it is better to use the term grapheme instead of the term character.

A rune is an integer representing a Unicode code point. The runes property of a string returns its runes. The term was borrowed from Go. The runes property of a string returns the Unicode code points of a string. To express a Unicode code point, the \uXXXX syntax is used, where XXXX is a 4-digit hexadecimal value. If a Unicode code point requires more than 4 digits, we place the value in curly brackets.

The bytes are the actual information stored for the string contents. Each code point can require one or more bytes of storage depending on the Unicode standard being used (UTF-8, UTF-16, etc.).

The characters package contains functions for more advanced manipulation of Unicode graphemes.

The three concepts relate to the same string but count different things:

Concept	Dart API	Value for `🐘` (U+1F418)	Description
Code unit	`s.codeUnits`	`[55357, 56344]` — 2 units	Raw UTF-16 storage unit (16 bits)
Code point / rune	`s.runes`	`[128024]` — 1 point	Unicode scalar value (U+1F418)
Grapheme cluster	`s.characters`	`['🐘']` — 1 cluster	What a human perceives as one character

Most Unicode bugs arise from working at the wrong level — using codeUnits to count characters, or runes to count emoji with skin tone modifiers. The sections below show each level in practice.

UTF-8 and UTF-16

Dart stores strings internally as UTF-16. Code points in the Basic Multilingual Plane (U+0000–U+FFFF) occupy one code unit; supplementary code points (U+10000–U+10FFFF) require two code units called a surrogate pair. For I/O — files, the network, and the Dart VM itself — Dart converts strings to UTF-8, which uses 1 to 4 bytes per code point.

main.dart

import 'dart:convert';

void main() {
  const elephant = '\u{1F418}'; // U+1F418 🐘 — supplementary code point

  // UTF-16 storage: supplementary code points use a surrogate pair
  print(elephant.codeUnits);         // [55357, 56344]
  print(elephant.codeUnits.length);  // 2 code units
  print(elephant.runes.length);      // 1 code point

  // UTF-8 I/O: 4 bytes for code points in U+10000–U+10FFFF
  print(utf8.encode(elephant));        // [240, 159, 144, 152]
  print(utf8.encode(elephant).length); // 4 bytes

  // ASCII: 1 code unit = 1 code point = 1 byte in both encodings
  print('A'.codeUnits);        // [65]
  print(utf8.encode('A'));     // [65]

  // Latin extended é (U+00E9): 1 code unit, but 2 UTF-8 bytes
  print('\u00E9'.codeUnits);       // [233]
  print(utf8.encode('\u00E9'));    // [195, 169]
}

The surrogate pair values (55357, 56344) are internal UTF-16 bookkeeping and do not appear in UTF-8 output. A string of five emoji has codeUnits.length of 10, runes.length of 5, and a UTF-8 byte count of up to 20 bytes — three different numbers for the same string.

$ dart main.dart
[55357, 56344]
2
1
[240, 159, 144, 152]
4
[65]
[65]
[233]
[195, 169]

Dart runes simple example

The following simple example works with runes.

main.dart

void main() {
  final msg = 'an old falcon';
  print(msg.codeUnits);

  for (final rune in msg.runes) {
    print(rune);
  }
}

The codeUnits property returns a list of UTF-16 code units. The runes property provides an iterable over the Unicode code points (runes) of the string.

$ dart main.dart
[97, 110, 32, 111, 108, 100, 32, 102, 97, 108, 99, 111, 110]
97
110
32
111
108
100
32
102
97
108
99
111
110

Dart String.fromCharCode

The String.fromCharCode turns a code point into a grapheme.

main.dart

import 'dart:io';

void main() {
  final msg = 'one \u{1F418} and three \u{1F40B}';

  for (final rune in msg.runes) {
    stdout.write('${String.fromCharCode(rune)} ');
  }
  stdout.writeln();
}

In the example, we go through the string runes. We use the String.fromCharCode member function to transform each rune into a grapheme.

$ dart main.dart
o n e   🐘   a n d   t h r e e   🐋

Dart emojis

The following example displays four emojis.

main.dart

void main() {
  final c1 = '\u{1F9F6}';
  final c2 = '\u{1FA86}';
  final c3 = '\u26C4';
  final c4 = '\u{1F37A}';

  print(c1);
  print(c2);
  print(c3);
  print(c4);

  print(c3.codeUnits);
  print(c4.codeUnits);
}

The emojis are expressed using the special syntax with hexadecimal values.

$ dart emojis.dart
🧶
🪆
⛄
🍺
[9924]
[55356, 57210]

Note that the fourth emoji uses two code points.

Dart runes length

With the length property, we can determine the length of a string or the corresponding runes attribute.

main.dart

void main() {
  final msg = 'one 🐘 and three 🐋';

  print(msg.length);
  print(msg.runes.length);
}

We have a string consisting of ASCII characters and two emojis.

print(msg.length);
print(msg.runes.length);

We access the length property of the string object and its runes attribute.

$ dart main.dart
19
17

The runes.length gives the correct answer; there are 17 graphemes in the string.

In the next example, we count the graphemes of different writing systems.

main.dart

void main() {
  final msg1 = "falcon";
  final msg2 = "вишня";
  final msg3 = "🐺🦊🦝";
  final msg4 = "नमस्ते";

  print(msg1.length);
  print(msg2.runes.length);
  print(msg3.runes.length);
  print(msg4.runes.length);
}

In the program, we count the number of graphemes in ASCII, Cyrillic, and Sanskrit and emojis.

$ dart main.dart
6
5
3
6

The example gives correct results for all except for the Sanskrit. For more complex examples, we need to use the characters package.

Combining characters

Unicode allows the same visible character to be stored in multiple ways. The letter é can be encoded as a single precomposed code point (NFC, U+00E9) or as the base letter e followed by a combining acute accent (NFD, U+0065 + U+0301). Both render identically but are distinct Dart strings. Dart does not normalize strings automatically.

main.dart

import 'package:characters/characters.dart';

void main() {
  final precomposed = '\u00E9';   // NFC: é as a single code point U+00E9
  final decomposed  = 'e\u0301'; // NFD: e + combining acute accent U+0301

  print('precomposed runes: ${precomposed.runes.length}'); // 1
  print('decomposed  runes: ${decomposed.runes.length}');  // 2

  // They look identical but are NOT equal as Dart strings
  print(precomposed == decomposed); // false

  // The characters package treats both as exactly 1 grapheme cluster
  print(precomposed.characters.length); // 1
  print(decomposed.characters.length);  // 1
}

If you receive text from different sources — user input, an API, a file — the same visible character may arrive in NFC or NFD form and will silently fail equality checks. Normalize all strings to the same canonical form before comparing them in production code.

$ dart main.dart
precomposed runes: 1
decomposed  runes: 2
false
1
1

Flag emojis

Country flag emojis are composed of two Regional Indicator Symbol Letters (RISL, U+1F1E6–U+1F1FF) corresponding to the ISO 3166-1 two-letter country code. Because each RISL is a supplementary code point, a flag emoji always consists of 2 code points and 4 UTF-16 code units, yet forms a single grapheme cluster.

main.dart

import 'package:characters/characters.dart';

void main() {
  final sk = '\u{1F1F8}\u{1F1F0}'; // RISL S + RISL K = 🇸🇰
  final us = '\u{1F1FA}\u{1F1F8}'; // RISL U + RISL S = 🇺🇸

  print(sk);                       // 🇸🇰
  print(sk.runes.toList());        // [127480, 127472]
  print(sk.runes.length);          // 2 code points
  print(sk.codeUnits.length);      // 4 code units
  print(sk.characters.length);     // 1 grapheme cluster

  // Build a flag from an ISO 3166-1 country code
  int risl(String ch) => 0x1F1E6 + ch.codeUnitAt(0) - 0x41; // 0x41 = 'A'
  String makeFlag(String cc) =>
      String.fromCharCode(risl(cc[0])) + String.fromCharCode(risl(cc[1]));

  print(makeFlag('DE')); // 🇩🇪
  print(makeFlag('JP')); // 🇯🇵
  print(makeFlag('BR')); // 🇧🇷
  print(us);             // 🇺🇸
}

Not all two-letter combinations correspond to a recognized flag; unsupported codes display as two RISL letters rather than a flag image. The characters package correctly groups both RISL letters into one grapheme cluster, so sk.characters.length is 1 even though sk.runes.length is 2.

$ dart main.dart
🇸🇰
[127480, 127472]
2
4
1
🇩🇪
🇯🇵
🇧🇷
🇺🇸

Skin tone modifiers

Many human emoji support five skin tones via Fitzpatrick modifier code points (U+1F3FB–U+1F3FF). The modifier immediately follows the base emoji and forms a single grapheme cluster, but adds one extra code point and two extra UTF-16 code units. Using runes.length to count such emoji gives the wrong answer; characters.length is always correct.

main.dart

import 'package:characters/characters.dart';

void main() {
  final base   = '\u{1F44B}';            // 👋 waving hand, no modifier
  final light  = '\u{1F44B}\u{1F3FB}';  // 👋🏻 + light skin tone U+1F3FB
  final medium = '\u{1F44B}\u{1F3FD}';  // 👋🏽 + medium skin tone U+1F3FD
  final dark   = '\u{1F44B}\u{1F3FF}';  // 👋🏿 + dark skin tone U+1F3FF

  for (final e in [base, light, medium, dark]) {
    print('$e  runes: ${e.runes.length}  chars: ${e.characters.length}');
  }
}

The base emoji has 1 code point; each modified variant has 2 code points (base + Fitzpatrick modifier). The characters package always reports 1 grapheme cluster, correctly reflecting what the user sees.

$ dart main.dart
👋  runes: 1  chars: 1
👋🏻  runes: 2  chars: 1
👋🏽  runes: 2  chars: 1
👋🏿  runes: 2  chars: 1

ZWJ sequences

A Zero Width Joiner (ZWJ, U+200D) fuses adjacent emoji into a new visual unit. Professional emoji and family emoji are the most common examples. A ZWJ sequence chains multiple code points into a single grapheme cluster.

main.dart

import 'package:characters/characters.dart';

void main() {
  // woman (U+1F469) + ZWJ (U+200D) + laptop (U+1F4BB) = 👩‍💻
  const techie = '\u{1F469}\u200D\u{1F4BB}';

  // man + ZWJ + woman + ZWJ + girl + ZWJ + boy = 👨‍👩‍👧‍👦
  const family = '\u{1F468}\u200D\u{1F469}\u200D\u{1F467}\u200D\u{1F466}';

  print('techie runes:      ${techie.runes.length}'); // 3
  print('techie characters: ${techie.characters.length}'); // 1

  print('family runes:      ${family.runes.length}'); // 7
  print('family characters: ${family.characters.length}'); // 1

  // ZWJ (decimal 8205) is present in the sequence
  print('ZWJ present: ${family.runes.contains(0x200D)}'); // true

  print(techie); // 👩‍💻
  print(family); // 👨‍👩‍👧‍👦
}

techie contains 3 code points (woman, ZWJ, laptop) but renders as one emoji. family contains 7 code points (4 people + 3 ZWJs). Only characters.length returns 1 in both cases, correctly reflecting what the user sees as a single character.

$ dart main.dart
techie runes:      3
techie characters: 1
family runes:      7
family characters: 1
ZWJ present: true
👩‍💻
👨‍👩‍👧‍👦

Dart characters

The characters package supports Unicode (extended) grapheme clusters.

$ dart pub add characters

We add the package to the project.

main.dart

import 'package:characters/characters.dart';

void main() {
  final m1 = "🐺🦊🦝";
  final m2 = "вишня";
  final m3 = "नमस्ते";

  print(m1.characters.length);
  print(m2.characters.length);
  print(m3.characters.length);

  print(m3.characters.first);
  print(m3.characters.last);

  for (final e in m3.characters) {
    print(e);
  }
}

In the program, we count the graphemes of emojis and Cyrillic and Sanskrit text.

import 'package:characters/characters.dart';

The package is imported.

print(m1.characters.length);
print(m2.characters.length);
print(m3.characters.length);

The package gives us the characters attribute.

print(m3.characters.first);
print(m3.characters.last);

We get the first and last grapheme of the Sanskrit text.

for (final e in m3.characters) {
  print(e);
}

We print all its graphemes.

$ dart main.dart
3
5
4
न
ते
न
म
स्
ते

Now we get correct results.

Grapheme slicing pitfalls

String.length, array indexing, and substring all operate on UTF-16 code units. For strings containing supplementary code points or multi-code-point graphemes, this silently corrupts the data rather than throwing an error.

main.dart

import 'package:characters/characters.dart';

void main() {
  // ZWJ sequence: 5 code units, 3 code points, 1 grapheme
  final emoji = '\u{1F469}\u200D\u{1F4BB} is coding';

  print(emoji.length);             // 15 code units
  print(emoji.runes.length);       // 13 code points
  print(emoji.characters.length);  // 11 graphemes

  // substring() operates on code units — cuts the surrogate pair of 👩 in half
  final broken = emoji.substring(0, 1); // lone high surrogate
  print(broken.codeUnits);        // [55357] — meaningless fragment
  print(broken == '\uD83D');      // true — not a valid Unicode scalar

  // Safe: grapheme-aware slicing
  final safe = emoji.characters.take(1).toString();
  print(safe);         // 👩‍💻 — full grapheme preserved
  print(safe.length);  // 5 code units, but exactly 1 user-visible character
}

emoji.substring(0, 1) cuts at code unit 1, which falls inside the surrogate pair for 👩, returning a lone high surrogate. This is a valid Dart string object but not a valid Unicode scalar value. In contrast, emoji.characters.take(1) advances to the end of the first grapheme cluster regardless of how many code units it spans.

$ dart main.dart
15
13
11
[55357]
true
👩‍💻
5

Safe grapheme iteration

Iterating runes is correct for text that contains no combining marks or multi-code-point clusters, but will split complex graphemes such as skin-tone emoji and Devanagari syllables into their constituent code points. Iterating characters is always correct.

main.dart

import 'package:characters/characters.dart';

void main() {
  // Mix of ASCII, emoji with skin-tone modifier, and Devanagari
  final text = 'Hi \u{1F44B}\u{1F3FD} \u0928\u092E\u0938\u094D\u0924\u0947';

  // runes: splits 👋🏽 into [👋, 🏽] and Devanagari into 6 code points
  final byRune = text.runes.map(String.fromCharCode).toList();
  print('By rune (${byRune.length}): $byRune');

  // characters: correct grapheme clusters every time
  final byChar = text.characters.toList();
  print('By char (${byChar.length}): $byChar');
}

The runes loop emits 12 items — it separates the skin-tone modifier from its base emoji and decomposes each Devanagari conjunct into its individual code points. The characters loop emits 9 items, correctly grouping each user-visible unit into one entry.

$ dart main.dart
By rune (12): [H, i,  , 👋, 🏽,  , न, म, स, ्, त, े]
By char (9): [H, i,  , 👋🏽,  , न, म, स्, ते]

Unicode-aware string reversal

Reversing a Unicode string is a classic trap. Naive approaches corrupt surrogate pairs or ZWJ sequences. Correct reversal must operate on grapheme clusters.

main.dart

import 'package:characters/characters.dart';

void main() {
  // "café 🇸🇰"
  final s = 'caf\u00E9 \u{1F1F8}\u{1F1F0}';

  // Naive: reverses UTF-16 code units — inverts the surrogate pairs for
  // 🇸 and 🇰, producing a string with invalid lone surrogates.
  final naiveRev = s.split('').reversed.join();
  print('Naive:  $naiveRev'); // garbled — lone surrogates in wrong positions

  // Rune reversal: avoids broken surrogates, but reverses the two RISL
  // code points individually, so 🇸🇰 becomes 🇰🇸.
  final runeRev = String.fromCharCodes(s.runes.toList().reversed);
  print('Rune:   $runeRev'); // 🇰🇸 éfac — flag letters reversed

  // Correct: reverse grapheme clusters — flag preserved as one unit.
  final safeRev = s.characters.toList().reversed.join();
  print('Safe:   $safeRev'); // 🇸🇰 éfac
}

The naive split('').reversed operates on code units, so the surrogate pairs for each Regional Indicator letter are split across the boundary. The rune-based reversal avoids broken surrogates but still reverses the two individual RISL code points, turning 🇸🇰 into 🇰🇸. Only grapheme-cluster reversal keeps the flag intact.

$ dart main.dart
Naive:  (garbled — lone surrogates)
Rune:   🇰🇸 éfac
Safe:   🇸🇰 éfac

Performance considerations

The three APIs have different cost profiles:

codeUnits — near-zero overhead; directly exposes the internal UTF-16 storage with O(1) indexed access. Use it for encoding, hashing, and low-level manipulation where you fully control the character set.
runes — lightweight lazy iteration that decodes surrogate pairs on the fly. Correct for any text that does not contain combining marks, skin-tone modifiers, or ZWJ sequences — which covers most Latin and Cyrillic text.
characters — performs full Unicode extended grapheme cluster segmentation on every call. More CPU-intensive than runes, but the only safe choice for user-visible text operations (counting, slicing, iterating, reversing) that may involve complex scripts, emoji with modifiers, or ZWJ sequences.

As a rule of thumb: use codeUnits for I/O and encoding, use runes for simple Unicode text, and use characters whenever correctness for all human languages matters.

Source

Dart Runes - language reference

In this article we covered Dart runes. We explored code units, code points, and grapheme clusters, demonstrated UTF-8 vs UTF-16 encoding, and showed how to handle combining characters, flag emoji, skin tone modifiers, and ZWJ sequences correctly using the characters package.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Dart tutorials.

Ebooks

Dart runes

UTF-8 and UTF-16

Dart runes simple example

Dart String.fromCharCode

Dart emojis

Dart runes length

Combining characters

Flag emojis

Skin tone modifiers

ZWJ sequences

Dart characters

Grapheme slicing pitfalls

Safe grapheme iteration

Unicode-aware string reversal

Performance considerations

Source

Author