(Coursenotes for CSC 203 Project-based Object-oriented Programming and Design)

Streams

Overview

Having learned about lambdas in the previous lesson, we will learn about a related construct in Java called streams. Lambdas and streams are often used together.

Streams allow us to take a series of computations we mean to perform on a collection of data, and compose them into a “pipeline”.

In this lesson, we’ll start with concrete examples of using the Streams API, since on the surface there is very little new or unfamiliar happening here. Following this, there is a brief discussion about what exactly is meant by “streaming”, and some of the underlying properties of streams in Java that are important to know about.

An example problem

Before we start, let’s recall the map and filter patterns that we talked about in the previous lesson.

Continuing with our examples of Album objects, let’s suppose we have a list of Albums that we are working with. For the purposes of this example, let’s assume Album objects have the following fields:

We are given the following problem prompt:

Write a program that consumes a list of Album objects, and, for the Albums released after the year 2000, computes the average number of units they have sold.

Let’s consider two solutions to this problem: one using regular for-each loops, like we are used to, and one using streams and lambdas.

A for-each loop solution

As you read the code below, try to identify usage of the map or filter patterns.

public static double averageSalesAfter2000(List<Album> albums) {
  long sum = 0;
  int albumsAfter2000 = 0;

  for (Album current : albums) {
    if (album.getYear() > 2000) {
      long sales = album.getSales();
      sum = sum + sales;
      albumsAfter2000 = albumsAfter2000 + 1;
    }
  }

  if (albumsAfter2000 > 0) {
    return sum / albumsAfter2000;
  } else {
    return 0;
  }
}

A streams solution

The same problem can be solved using streams. The code is below, and an explanation of each line follows.

public static double averageSalesAfter2000(List<Album> albums) {
  OptionalDouble result =  albums.stream() // Stream<Album>
    .filter(a -> a.getYear() > 2000) // Stream<Album>
    .mapToLong(a -> a.getUnitsSold()) // LongStream 
    .average(); // OptionalDouble

  // After filtering, there may not be any albums left.
  // In that case, we just return 0.
  return result.orElse(0);
}

In the code above, we have organised a series of computations into a stream pipeline.

The Streams API provides a whole host of operations that can performed on streams of data. filter, map, or specialised maps like mapToLong are just the tip of the iceberg. You can explore the API at the Stream JavaDoc page.

Streams are not data structures

A stream, by itself, does not store data, and is technically not a data structure. Streams are wrappers around a data source. They allow us to define a series of operations that should be performed on that data source, and they make bulk processing of data convenient and fast.

The “data source” for a stream can be anything—an array or list, a file stored on disk, a stream of data coming from some external service, etc. In this class, we will only deal with streams based on lists or arrays, but this section describes how Streams might be used to work with other types of data sources.

A stream never modifies its underlying data source. For example, you cannot use stream operations on a list to remove items from or add items to the list. Just like you can’t add or remove items from a list while looping over it using a for-each loop.

A stream pipeline usually consists of 3 pieces:

In our example above,

Stream pipelines are lazy. A stream pipeline will not begin executing until it has to. Specifically, the stream processing won’t be “kicked off” until a terminal operation is called.

For example, if we had only called filter and mapToLong above, we would still be left with a LongStream, i.e., a stream of longs. No processing would take place unless some terminal operation was added to the pipeline.

Some examples of terminal operations are:

List<Double> albumCosts = albums.stream()
  .filter(a -> a.getYear() > 2000)
  .map(a -> a.getPrice())
  .toList();
int albumsBefore2000 = albums.stream()
  .filter(a -> getYear() < 2000)
  .count();
// Reduce cost of pre-2000 albums by 10%
albums.stream()
  .filter(a -> a.Year() < 2000)
  .forEach(a -> a.setPrice(a.getPrice() * 0.9));

What is “streaming”?

You likely already know the meaning of the word “streaming”. For example, you’ve heard of “streaming music” or “streaming a video” over the internet. To simplify it greatly, it means to process data while it loads, rather than to load all the data before beginning to process it.

For example, when you’re streaming a movie on Netflix, you’re not actually downloading the whole movie to your machine and then watching it. Rather, chunks of the movie are being sent to your computer and played in your browser as they arrive.

Streams in Java are a similar idea.

This can be a useful mode of operation when you are working with huge amounts of data that cannot all be loaded into memory at once, or if you are working with “never-ending data”, for example, minute-by-minute readings from weather sensors. In these situations, you cannot wait to load all the data into, say, an ArrayList before you begin processing the data.

Consider the following scenario. Let’s imagine you need to read and process data from a HUGE file on your hard disk: MyGiantFile.txt The file is too large for you read the entire thing into a list of strings.

One way you could do this is to use a Scanner to read the file and process it line by line, like we have done in a project and a couple of labs this term.

Scanner scanner = new Scanner(new File("MyGiantFile.txt"));
while (scanner.hasNext()) {
  String line = scanner.nextLine();
  
  // Assume we do some work with the line here
}

With the streams API, we can now concisely define operations like the above using lambdas and all the benefits they bring.

The Files.lines static method creates a stream of strings, allowing us to define a pipeline of operations that will apply to each line in MyGiantFile.txt.

Files.lines(Path.of("MyGiantFile.txt"))
  .map(line -> .....)
  .filter(line -> ........)
  .forEach(line -> .......);

Because Files.lines returns a Stream<String>, the lines in the file and streamed through our pipeline, but this detail is abstracted away from you, the developer.

If you use a simple collection in memory (like an array or list) as the source of a stream, you’re not gaining much in the way of “streaming” — in that situation, the Streams API mostly provides a convenient library and syntax for performing operations on a collection data. Still pretty good!