(Coursenotes for CSC 305 Individual Software Design and Development)

Streams

References:

Streams and lambdas

Creating streams

You can create empty streams, or streams from existing data sources like lists.

Stream<String> s1 = Stream.empty(); // empty stream

// Stream of strings
Stream<String> s2 = Stream.of("these", "are", "stream", "contents");

// Stream of strings
List<String> myList = List.of("This", "is", "a", "list", "of", "Strings");
Stream<String> s3 = myList.stream();

// Using the builder pattern
Stream<String> s4 = Stream.<String>builder()
    .add("builder")
    .add("pattern")
    .add("in action")
    .build();

Note that if the Stream comes from an existing data source, it does NOT modify that data source, no matter what operations are performed in the Stream.

Beyond simply creating a stream from pre-existing data, you can generate streams by doing other transformations on data:

Random random = new Random();
DoubleStream ds = random.doubles(3); // Stream of 3 doubles
IntStream intStream = IntStream.range(1, 3);
LongStream longStream = LongStream.rangeClosed(1, 3);

Finally, you can also create streams out of file contents:

Files.lines(Path.of("file.txt"), Charset.defaultCharset())
    .forEach(System.out::println);

This is in contrast to Files.readAllLines(Path.of("file.txt")), which would read all lines into a List<String>. This can be time and memory intensive. The Stream solution loads lines “lazily” and processes them one-at-a-time.

Stream pipelines

In general, a stream pipeline contains of:

Many stream operations take in a behavioural parameter (i.e., a function). This can be written inline as a lambda, referred to using a variable that points to a lambda, or using the method reference syntax (e.g., System.out::println). These behavioural parameters represent functions that must be applied to each item in the stream.

Earlier in the quarter we talked about various “small patterns” we often perform with for loops in imperative languages, like transforming all items in a collection by applying some function to each item (map), or removing certain items in a collection based on some condition (filter), or summing up or aggregating in some way the values in a collection (reduce).

Streams allow us to define “pipelines” of these operations to be performed on collections.

For example, imagine that we have a giant file of strings, and we need to:

If our data is in a file called “file.txt”, this would look like

Stream<String> result = Files.lines(Path.of("file.txt"), Charset.defaultCharset())
    .map(String::toUpperCase)
    .filter(l -> l.contains("SECRET PHRASE"));

You’ll notice that we still only have a Stream<String> after the above code runs. That’s because all we’ve done is create a pipeline of operations to be run — we haven’t actually executed those operations yet. The map and filter above are intermediate operations. They are not actually kicked off until a terminal operation is added to the stream pipeline.

As I mentioned earlier, a terminal operation produces some result or side effect, thereby exiting the stream pipeline. Some examples of terminal operations are:

Here are some examples:

Notice that the type of result is now List<String>. It is no longer a stream to which we can add further computations.

List<String> result = Files.lines(Path.of("file.txt"), Charset.defaultCharset())
    .map(String::toUpperCase)
    .filter(l -> l.contains("SECRET PHRASE"))
    .toList()

You can specify that a map should mapToInt (i.e., map to an IntStream). That allows stream operations

OptionalInt result = Files.lines(Path.of("file.txt"), Charset.defaultCharset())
    .map(String::toUpperCase)
    .filter(l -> l.contains("SECRET PHRASE"))
    .mapToInt(String::length)
    .max();

PONDER Why do you think we get an OptionalInt instead of a plain old int in return?

Finally, you can terminate streams with “side effects”, i.e., functions that don’t return a value, but have some other effect (e.g., they change the value of some other variable, or they write to some output stream).

Files.lines(Path.of("file.txt"), Charset.defaultCharset())
    .map(String::toUpperCase)
    .filter(l -> l.contains("SECRET PHRASE"))
    .forEach(System.out::println);

In the code above, we are applying the forEach terminal operation the stream. In the terminal operation, we are passing each item to the System.out::println function that you know and love. Recall that the :: is the method reference syntax — we are “pointing to” the println function and saying “call this on each item in the stream”. If lambdas are more your thing, you can write that as l -> System.out.println(l). But in general it’s better to use method references for lambdas that are this simple.

Stream pipelines are evaluated lazily

Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.

This has important implications. For example, consider the following pipeline, where I’ve added print statements in each operation.

OptionalInt result = Files.lines(Path.of("file.txt"), Charset.defaultCharset())
    .map(line -> {
        System.out.println("Upper-casing " + line);
        return line.toUpperCase();
    })
    .filter(line -> {
        System.out.println("\tChecking " + line + " for secret phrase");
        return line.contains("SECRET PHRASE")
    })
    .mapToInt(line -> {
        System.out.println("\t\tMapping " + line + " to charlength");
        return line.length();
    })
    .max();

Can you predict what the printed output would be with the following input?

INPUT

here
are
sOME
LINes SEcret phrASE
in
A
File

OUTPUT

Upper-casing here
        Checking HERE for secret phrase
Upper-casing are
        Checking ARE for secret phrase
Upper-casing sOME
        Checking SOME for secret phrase
Upper-casing LINes SEcret phrASE
        Checking LINES SECRET PHRASE for secret phrase
                Mapping LINES SECRET PHRASE to charlength
Upper-casing in
        Checking IN for secret phrase
Upper-casing A
        Checking A for secret phrase
Upper-casing File
        Checking FILE for secret phrase

The mapToInt step only applied to one item—the one that survived the previous filtering step.

Rules for behavioural parameters

All behavioural parameters to streams must:

From the Stream documentation

A stream implementation is permitted significant latitude in optimizing the computation of the result. For example, a stream implementation is free to elide operations (or entire stages) from a stream pipeline – and therefore elide invocation of behavioral parameters – if it can prove that it would not affect the result of the computation. This means that side-effects of behavioral parameters may not always be executed and should not be relied upon, unless otherwise specified (such as by the terminal operations forEach and forEachOrdered).

In short, you cannot rely on all stream operations always being executed.

Parallel streams

As mentioned above, a Stream doesn’t kick off until a terminal operation is called. So until that happens, the Stream is still being built up (using the Builder pattern), and all the intermediate operations like map and filter are being added to it.

At this point the “Stream” is usually a “sequential stream”. That is, it processes the data in one thread. However, modern computers usually have multiple cores available, meaning they can perform several actions at once. This means that some computations can be sped up if we can split up the problem (or the input data) into subsets, process those subsets, and combine the results.

You can do this by turning the Stream into a Parallel Stream.

Any stream can be told to operate in parallel by calling parallel() on it. parallel() is an intermediate operation. (Or, if your stream’s data source is a data structure like a list, you can call parallelStream() on it instead of stream() to begin streaming).

While this can result in a significant speedup, there are some important things to be aware of:

Example from Baeldung

List.of(1, 2, 3, 4)
  .parallelStream()
  .reduce(5, Integer::sum);

In normal sequential application, we would get the result 5 + 1 + 2 + 3 + 4 = 15. However, in a parallel stream, the reduce is given to each thread to handle, and 5 is added in each thread. Depending on how many threads are dedicated to this task, we will get different responses.