Java Spliterator in Parallel
Last modified: May 8, 2025
The java.util.Spliterator is a powerful iterator-like interface
introduced in Java 8, designed for efficiently traversing and partitioning
elements of a data source. Its most crucial capability is parallel processing,
enabling large datasets to be split into smaller segments that can be processed
concurrently by multiple threads. Unlike a traditional
Iterator, which only supports sequential iteration, a
Spliterator can divide data dynamically, making it a foundational
component of Java's parallel Streams API.
A Spliterator provides metadata about the underlying data source
using characteristics that help optimize parallel execution. These
characteristics influence how tasks are distributed across threads, ensuring
efficient workload division. Common characteristics include:
ORDERED: Elements are accessed in a defined sequence.DISTINCT: Ensures all elements are unique, avoiding redundant computations.SORTED: Guarantees a specific order, enabling optimized parallel sorting.SIZED: Provides an accurate element count, helping balance workload distribution.NONNULL: Prevents null values from disrupting parallel execution.IMMUTABLE: Ensures thread safety by preventing structural modifications.CONCURRENT: Allows safe parallel modification without race conditions.SUBSIZED: Indicates that split portions retain a reliable size estimate.
These characteristics are essential for parallel computation, allowing the Java
runtime to make intelligent decisions about task partitioning and resource
allocation. By leveraging trySplit, a Spliterator
enables workload division across multiple threads, ensuring that computational
efficiency scales with available processing power.
Java's Fork/Join framework relies on Spliterator internally for
managing parallel streams, dynamically distributing tasks to maximize
performance. Developers can either use parallel strea ms for automatic
concurrency or explicit multi-threading for fine-grained control over execution.
Understanding how Spliterator facilitates parallel processing is
crucial for building scalable, high-performance applications.
Basic Spliterator Traversal (Sequential)
This example demonstrates basic sequential traversal using a Spliterator.
We obtain a Spliterator from a List of strings. Then, we
use the tryAdvance method in a loop to process each element.
tryAdvance takes a Consumer that specifies the action
to perform on the element. It returns false when no more elements
are available.
This example showcases the fundamental iteration capability of a
Spliterator. It prints characteristics and estimated size before
iterating. The iteration itself is sequential, processing one element at a time
in the main thread.
package com.zetcode;
import java.util.List;
import java.util.Spliterator;
public class Main {
public static void main(String[] args) {
List<String> names = List.of("John", "Jane", "Mike", "Sarah", "Tom");
Spliterator<String> spliterator = names.spliterator();
System.out.println("Characteristics: " + spliterator.characteristics());
System.out.println("Estimated size: " + spliterator.estimateSize());
System.out.println("Elements (sequentially):");
boolean hasNextElement;
do {
hasNextElement = spliterator.tryAdvance(name -> {
System.out.println("Processing: " + name + " by " + Thread.currentThread().getName());
// Simulate some work
try {
Thread.sleep(100);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
});
} while (hasNextElement); // Continue until no more elements remain
System.out.println("Finished processing.");
}
}
The output shows the characteristics and initial estimated size. Each element is processed by the main thread. This forms the basis for understanding how Spliterators work before diving into parallel processing.
Understanding trySplit
The trySplit method is fundamental to a Spliterator's
role in parallel processing. It attempts to partition the source elements into
two. If successful, it returns a new Spliterator covering a leading
portion of the elements, while the original Spliterator covers the
remainder. This example demonstrates splitting, but still processes
sequentially.
It's crucial to understand that calling trySplit by itself does not
initiate parallel execution. It merely prepares the data by dividing it. The
resulting Spliterators must be processed by separate threads to achieve
parallelism, as shown in later examples.
package com.zetcode;
import java.util.List;
import java.util.Spliterator;
public class Main {
public static void main(String[] args) {
List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
Spliterator<Integer> spliterator1 = numbers.spliterator();
System.out.println("Original spliterator size: " + spliterator1.estimateSize());
// Attempt to split the spliterator
Spliterator<Integer> spliterator2 = spliterator1.trySplit();
if (spliterator2 != null) {
System.out.println("First split size: " + spliterator1.estimateSize());
System.out.println("Second split size: " + spliterator2.estimateSize());
System.out.println("\nProcessing first split (sequentially):");
spliterator1.forEachRemaining(num -> System.out.println(num + " by " + Thread.currentThread().getName()));
System.out.println("\nProcessing second split (sequentially):");
spliterator2.forEachRemaining(num -> System.out.println(num + " by " + Thread.currentThread().getName()));
} else {
System.out.println("Could not split the spliterator. Processing all elements:");
spliterator1.forEachRemaining(System.out::println);
}
}
}
The output demonstrates that the original list is divided. However, both parts
are processed sequentially by the main thread. This example clarifies that
trySplit is a mechanism for partitioning, not direct parallel
execution.
Parallel Execution with ExecutorService and Spliterator
This example shows how to achieve true parallel processing using Spliterator
splits with an ExecutorService. After splitting a Spliterator, we
submit tasks to an ExecutorService. Each task processes one of the
splits. This allows different parts of the data to be handled concurrently by
different threads.
We create a fixed-size thread pool. Each Spliterator (the original and the one
returned by trySplit) is processed by a separate task submitted to
the pool. The Thread.currentThread().getName() call helps identify which thread
processes each element.
package com.zetcode;
import java.util.List;
import java.util.Spliterator;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Main {
public static void main(String[] args) throws InterruptedException {
List<Integer> numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
Spliterator<Integer> spliterator1 = numbers.spliterator();
Spliterator<Integer> spliterator2 = spliterator1.trySplit(); // s1 is now roughly the second half
try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
System.out.println("Submitting tasks for parallel processing...");
// Task for the first split (which is now spliterator2)
if (spliterator2 != null) {
executor.submit(() -> {
System.out.println("Processing second half (split part) by " + Thread.currentThread().getName());
spliterator2.forEachRemaining(num -> {
System.out.println("S2: " + num + " by " + Thread.currentThread().getName());
try { Thread.sleep(100); } catch (InterruptedException e) { /*ignore*/ }
});
});
}
// Task for the remaining part of the original spliterator (spliterator1)
executor.submit(() -> {
System.out.println("Processing first half (original part) by " + Thread.currentThread().getName());
spliterator1.forEachRemaining(num -> {
System.out.println("S1: " + num + " by " + Thread.currentThread().getName());
try { Thread.sleep(100); } catch (InterruptedException e) { /*ignore*/ }
});
});
executor.shutdown();
if (!executor.awaitTermination(60, TimeUnit.SECONDS)) {
executor.shutdownNow();
}
System.out.println("All tasks completed.");
}
}
}
The output will show elements being processed by different threads from the pool (e.g., "pool-1-thread-1", "pool-1-thread-2"). This confirms that the processing of the two data partitions occurs in parallel. This manual approach gives fine-grained control over parallel execution.
Leveraging Parallel Streams with Spliterator
Java Streams provide a high-level API for processing sequences of elements.
Parallel streams use Spliterators internally to divide work among multiple
threads. This example shows two ways to get parallel streams:
collection.parallelStream and
StreamSupport.stream(spliterator, true).
The parallelStream method on a collection directly returns a
parallel stream. Alternatively, StreamSupport.stream can create a
stream from an existing Spliterator. Setting its second argument to
true makes the resulting stream parallel. The Java Fork/Join
framework manages the parallelism under the hood.
package com.zetcode;
import java.util.List;
import java.util.Spliterator;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public class Main {
public static void main(String[] args) {
List<String> words = List.of("apple", "banana", "cherry", "date",
"elderberry", "fig", "grape", "honeydew");
System.out.println("Processing with collection.parallelStream():");
words.parallelStream().forEach(word ->
System.out.println(word + " processed by " + Thread.currentThread().getName())
);
System.out.println("\nProcessing with StreamSupport.stream(spliterator, true):");
Spliterator<String> spliterator = words.spliterator();
Stream<String> parallelStreamFromSpliterator = StreamSupport.stream(spliterator, true);
parallelStreamFromSpliterator.forEach(word ->
System.out.println(word + " processed by " + Thread.currentThread().getName())
);
System.out.println("Finished processing with parallel streams.");
}
}
The output reveals how multiple threads execute tasks concurrently, efficiently
distributing workload across available processor cores. By leveraging Java's
Fork/Join framework, the Streams API seamlessly handles parallel execution,
ensuring optimal resource utilization without requiring manual thread
management. This example illustrates how Spliterator facilitates
data partitioning, enabling parallel streams to process elements independently
while improving performance in large-scale computations.
Parallel Processing of Primitive Data Types
Java provides specialized Spliterators for primitive types like
int, long, and double (e.g.,
Spliterator.OfInt). These avoid the overhead of boxing/unboxing
primitives to their wrapper classes. This example uses
Spliterator.OfInt obtained from an int array.
We first demonstrate creating a parallel IntStream using
StreamSupport.intStream(spliterator, true) to calculate a sum.
Then, we show manual parallel processing by splitting the Spliterator.OfInt
and using an ExecutorService. Each task uses forEachRemaining(IntConsumer)
for efficient primitive processing.
package com.zetcode;
import java.util.Arrays;
import java.util.Spliterator;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.LongAdder;
import java.util.stream.IntStream;
import java.util.stream.StreamSupport;
public class Main {
public static void main(String[] args) throws InterruptedException {
int[] numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20};
// Method 1: Using parallel IntStream from Spliterator.OfInt
Spliterator.OfInt spliteratorForStream = Arrays.spliterator(numbers);
IntStream parallelIntStream = StreamSupport.intStream(spliteratorForStream, true);
long sum1 = parallelIntStream.sum();
System.out.println("Sum using parallel IntStream: " + sum1);
// Method 2: Manual parallel processing with ExecutorService
System.out.println("\nManual parallel processing of primitive array:");
Spliterator.OfInt s1 = Arrays.spliterator(numbers);
Spliterator.OfInt s2 = s1.trySplit();
LongAdder partialSum1 = new LongAdder();
LongAdder partialSum2 = new LongAdder();
try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
if (s2 != null) {
executor.submit(() -> {
s2.forEachRemaining((int val) -> {
partialSum2.add(val);
});
});
}
executor.submit(() -> {
s1.forEachRemaining((int val) -> {
partialSum1.add(val);
});
});
executor.shutdown();
if (!executor.awaitTermination(1, TimeUnit.MINUTES)) {
executor.shutdownNow();
}
}
long totalSum = partialSum1.sum() + partialSum2.sum();
System.out.println("Sum using manual parallel processing: " + totalSum);
}
}
This example showcases two approaches for parallel processing of primitive
arrays. Using parallel streams is often more concise. Manual control with
ExecutorService offers flexibility, especially when integrating
with existing threading models or when fine-grained task management is required.
Both methods leverage the splitting capabilities of
Spliterator.OfInt.
Parallel Summation of Numbers using Spliterator
This example demonstrates summing a list of numbers in parallel. We obtain a
Spliterator from a List<Integer>, split it, and
then use an ExecutorService to calculate partial sums concurrently.
Each task is a Callable that returns its partial sum. These partial
sums are then collected using Future objects and combined to get
the total sum.
This pattern is common for "divide and conquer" parallel algorithms. The work of
summation is divided between threads, and results are aggregated. It highlights
how Spliterator facilitates breaking down a computation for
parallel execution.
package com.zetcode;
import java.util.List;
import java.util.Spliterator;
import java.util.concurrent.*;
import java.util.function.Consumer;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public class Main {
public static void main(String[] args) throws Exception {
List<Integer> numbers = IntStream.rangeClosed(1, 10000)
.boxed()
.collect(Collectors.toList());
Spliterator<Integer> spliterator1 = numbers.spliterator();
Spliterator<Integer> spliterator2 = spliterator1.trySplit(); // s1 is now the second half
try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
// Task for the first part (which is spliterator2)
Callable<Long> sumTask1 = () -> {
long sum = 0;
if (spliterator2 != null) {
SummingConsumer consumer = new SummingConsumer();
spliterator2.forEachRemaining(consumer);
sum = consumer.getTotal();
}
System.out.println("Sum from task 1 (thread " + Thread.currentThread().getName() + "): " + sum);
return sum;
};
// Task for the second part (remaining of spliterator1)
Callable<Long> sumTask2 = () -> {
SummingConsumer consumer = new SummingConsumer();
spliterator1.forEachRemaining(consumer);
long sum = consumer.getTotal();
System.out.println("Sum from task 2 (thread " + Thread.currentThread().getName() + "): " + sum);
return sum;
};
Future<Long> future1 = executor.submit(sumTask1);
Future<Long> future2 = executor.submit(sumTask2);
long totalSum = future1.get() + future2.get();
System.out.println("Total sum calculated in parallel: " + totalSum);
long expectedSum = numbers.stream().mapToLong(Integer::longValue).sum();
System.out.println("Expected sum (sequential stream): " + expectedSum);
executor.shutdown();
if (!executor.awaitTermination(1, TimeUnit.MINUTES)) {
executor.shutdownNow();
}
}
}
// Helper class for summing, as lambda variable for sum must be effectively final
static class SummingConsumer implements Consumer<Integer> {
private long total = 0;
@Override
public void accept(Integer value) {
total += value;
}
public long getTotal() {
return total;
}
}
}
The splitting mechanism is achieved using trySplit, which
divides the original Spliterator into two parts. The first
Spliterator (spliterator1) retains one portion of the data, while
the second Spliterator (spliterator2) handles the other. This
allows parallel processing of each subset independently.
To perform the parallel summation, an ExecutorService is employed
with a fixed thread pool of size 2. Each split is processed by a separate
thread, ensuring optimized resource utilization. The tasks are submitted as
Callable<Long> functions, which return computed sums
asynchronously.
A helper class, SummingConsumer, is used for accumulating sums
while traversing elements with forEachRemaining. This approach is
necessary because variables within lambda expressions must be effectively
final, preventing direct in-place modification.
Once both tasks complete execution, their results are combined to obtain the total sum in parallel. The computed result is then verified against a sequential sum obtained through a standard stream operation. This ensures correctness and provides insight into the performance benefits of parallel execution.
Parallel Data Transformation using Spliterator
This example demonstrates parallel processing using Spliterator and
ExecutorService. The program starts with a predefined list of words
and splits the workload into two separate tasks using trySplit.
Each task transforms its assigned words to uppercase and appends thread
identification to illustrate concurrency in action.
This is useful for CPU-bound transformation tasks on large datasets. Splitting the data allows multiple cores to work on the transformation simultaneously, potentially speeding up the overall process significantly.
package com.zetcode;
import java.util.ArrayList;
import java.util.List;
import java.util.Spliterator;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
public class Main {
public static void main(String[] args) throws Exception {
List<String> words = List.of("alpha", "bravo", "charlie", "delta", "echo",
"foxtrot", "golf", "hotel", "india", "juliett");
Spliterator<String> s1 = words.spliterator();
Spliterator<String> s2 = s1.trySplit(); // s1 is now the second half
try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
Callable<List<String>> transformTask1 = () -> {
List<String> result = new ArrayList<>();
if (s2 != null) { // s2 is the first half
s2.forEachRemaining(word -> {
result.add(word.toUpperCase() + " (processed by " + Thread.currentThread().getName() + ")");
});
}
return result;
};
Callable<List<String>> transformTask2 = () -> {
List<String> result = new ArrayList<>();
s1.forEachRemaining(word -> { // s1 is the second half
result.add(word.toUpperCase() + " (processed by " + Thread.currentThread().getName() + ")");
});
return result;
};
Future<List<String>> future1 = executor.submit(transformTask1);
Future<List<String>> future2 = executor.submit(transformTask2);
List<String> combinedResult = new ArrayList<>();
combinedResult.addAll(future1.get());
combinedResult.addAll(future2.get());
System.out.println("Transformed words in parallel:");
combinedResult.forEach(System.out::println);
executor.shutdown();
if (!executor.awaitTermination(1, TimeUnit.MINUTES)) {
executor.shutdownNow();
}
}
System.out.println("\nSequentially transformed for order reference:");
words.stream().map(String::toUpperCase).forEach(System.out::println);
}
}
The splitting mechanism divides the original Spliterator into two
segments. The first half (s2) and the remaining portion (s1) are then processed
independently by separate threads. This approach optimizes CPU utilization by
distributing computational tasks across multiple threads.
To facilitate true parallel execution, a fixed thread pool
(ExecutorService) is employed, ensuring that each split is
processed concurrently. The tasks are defined as
Callable<List<String>>, allowing asynchronous execution
while returning transformed results upon completion.
Each task uses forEachRemaining to process elements within its
split, ensuring efficient traversal of words without explicit iteration. Once
both tasks complete execution, their results are merged into a combined list for
final output.
It is important to note that order consistency is not guaranteed unless explicitly managed. Since the processing occurs across multiple threads, the final merged result may differ from the original sequence. For reference, a sequential transformation using a standard stream is displayed alongside the parallel output.
Finally, the ExecutorService is shut down gracefully, ensuring
efficient resource cleanup. If tasks exceed their expected execution time, an
emergency shutdown prevents unnecessary resource consumption.
Parallel Data Aggregation (e.g., Counting) using Spliterator
This example demonstrates parallel data aggregation, specifically counting elements
that match a certain criterion. We use a list of strings (simulating lines from a
file). The goal is to count how many strings contain a specific keyword.
The Spliterator is split, and each part is processed by a task in an
ExecutorService. Each task counts matching strings in its segment, and
the main thread sums these partial counts.
This approach is effective for operations like filtering and counting on large datasets. By dividing the dataset and processing parts concurrently, we can often achieve better performance than a purely sequential approach.
package com.zetcode;
import java.util.List;
import java.util.Spliterator;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class Main {
public static void main(String[] args) throws Exception {
List<String> lines = List.of(
"The quick brown fox",
"jumps over the lazy dog",
"A B C D E F G",
"Another line with fox here",
"Spliterators are useful for parallel fox processing",
"The lazy fox is quick"
);
String keyword = "fox";
Spliterator<String> s1 = lines.spliterator();
Spliterator<String> s2 = s1.trySplit(); // s1 is now the second half
try (ExecutorService executor = Executors.newFixedThreadPool(2)) {
Callable<Integer> countTask1 = () -> {
AtomicInteger count = new AtomicInteger(0);
if (s2 != null) { // s2 is the first half
s2.forEachRemaining(line -> {
if (line.contains(keyword)) {
count.incrementAndGet();
}
});
}
System.out.println("Count from task 1 (thread " + Thread.currentThread().getName() + "): " + count.get());
return count.get();
};
Callable<Integer> countTask2 = () -> {
AtomicInteger count = new AtomicInteger(0);
s1.forEachRemaining(line -> { // s1 is the second half
if (line.contains(keyword)) {
count.incrementAndGet();
}
});
System.out.println("Count from task 2 (thread " + Thread.currentThread().getName() + "): " + count.get());
return count.get();
};
Future<Integer> future1 = executor.submit(countTask1);
Future<Integer> future2 = executor.submit(countTask2);
int totalCount = future1.get() + future2.get();
System.out.println("Total lines containing '" + keyword + "': " + totalCount);
long expectedCount = lines.stream().filter(line -> line.contains(keyword)).count();
System.out.println("Expected count (sequential stream): " + expectedCount);
executor.shutdown();
if (!executor.awaitTermination(1, TimeUnit.MINUTES)) {
executor.shutdownNow();
}
}
}
}
In this example, a list of strings is processed in parallel to count occurrences
of the word "fox". Each task uses an AtomicInteger for its local
count to ensure thread safety within the lambda if it were more complex, though
here simple local variable would also work. The partial counts are retrieved via
Future objects and summed.
A key component in this implementation is the use of AtomicInteger,
which guarantees thread-safe increments while counting occurrences of the target
keyword. This approach ensures correctness when modifying shared variables
inside a lambda expression—though for simple cases, a local variable would
suffice.
Once both tasks complete execution, their partial results are retrieved via
Future objects and summed to calculate the total keyword
occurrences in parallel. The output is then validated by comparing it against
a sequential stream-based count, demonstrating the accuracy and efficiency
gains achieved through concurrent processing.
Source
Java Spliterator Documentation
This tutorial covered the Java Spliterator interface with a focus
on its use in parallel processing. We explored basic traversal, splitting,
characteristics, custom Spliterators, and integration with parallel streams and
ExecutorService. Understanding Spliterators is vital for writing
efficient, scalable Java code that can leverage multi-core processors for
data-intensive tasks.
Author
List all Java tutorials.