(Coursenotes for CSC 305 Individual Software Design and Development)

Open-closed principle, Liskov Substitution Principle, and the Composite Design Pattern

Reminders

First quiz on Thursday taken in lab.

What we’ve talked about so far

So far, we’ve talked about the following principles for good program design:

  • Minimise the scope of local variables
  • Use StringBuilder
  • Use the appropriate looping construct
  • Know and use standard libraries
  • Design and build loosely coupled modules
  • Use strategies like encapsulation, interfaces, and dependency injection
  • Design immutable classes wherever possible
  • Increase cohesion of individual classes

We’ve also started talking about the so-called “SOLID” principles of object-oriented design. We started by talking about the Single Responsibility Principle (the S in SOLID), which is the principle that guides our desire for LOOSE COUPLING between classes and TIGHT COHESION within classes.

Open-closed Principle

The Open-closed Principle states that software entities should be open to extension, but closed to modification.

Unfortunately, there are many interpretations of what exactly this means. The common interpretation says that in an ideally-designed software system, you should be able to add functionality to that system without modifying the existing code (or minimising changes to existing code), and only by adding new code. In an object-oriented language that supports inheritance (e.g., Java), this might be accomplished by creating extensions (subclasses) of existing classes, instead of modifying existing classes.

The idea itself is compelling—new features fitting seamlessly into an existing design is certainly an attractive proposition! However, I don’t think it’s terribly realistic a lot of the time. I shudder at the thought that we would create new subclasses or ever-deeper type hierarchies each time we wanted to add new functionality to a large software system.

Creating abstraction each time we want to modify our code bring other problems: it can create indirections and, as a result, increase the complexity of the system. If we try to create each time a new “entity” (using Martin’s words) when we want to change one, we’ll see the cheer numbers of “entities” going through the roof.

– blog post on “The Valuable Dev”

Happily, many of the “SOLID principles of object-oriented design” can also be thought about in a non-Object-oriented sense.

A good example is the regular expression example we saw last week. See lib/core.ts and the usePattern function in lib/vocabularies/code.ts in commit a0720f881b8331db1c8c38a805e24a71f5daacbb.

The usePattern function is closed to modification (i.e., we won’t make further changes to the function to support additional regular expression engines), but is open to extension (i.e., it can be extended to work with any regular expression parser, as long as it adheres to the RegExp interface).

When is the open-closed principle useful?

In many domains, plugin architectures are becoming extremely common. A plugin architecture is made up of two main components:

  • The core system is the main application, containing the fundamental logic needed for the system.
  • Connection points that various plugins can hook into to extend the core system’s functionality.

Just like we talked about classes exposing a fixed interface to other classes, you can imagine that the core system (which may, itself, contain many software components) exposes an interface that plugins can use to add new features, or even extend existing features, depending on what the core system exposes.

A great example is software development environments. Most major IDEs are built using a plugin architecture.

For example, VSCode provides facilities for external developers to create extensions that vastly extend the functionality of the core system.

And it’s not just for external developers! The VSCode team themselves have isolated the fundamental editor features in a core system (the microsoft/vscode repository), and they ship a large number of features as “extensions” to that core system. For example, basic python support comes with the vscode-python extension.

This plugin architecture allows for VSCode to be extended in myriad ways, without ever modifying the core VSCode functionality. That, the core functionality has no idea about the plugins that might be operating on it, and the plugins can manipulate parts of the VSCode application that the core system exposes through its extension mechanism.

This is a popular approach in IDEs. For example, the Kim Moir, describes the plugin architecture used in the Eclipse platform. That architecture has allowed the same base functionality to create a number of other IDEs by composing together various plugins. For example, Eclipse itself is a popular open-source Java IDE, but the same base platform is used to build DBeaver an open-source database tool.

Virtually all the JetBrains developer tools (including IntelliJ IDEA) are built as combinations of various plugins on top of the core architecture.

So, plugin architectures are a good example of the Open-closed principle in action. But I am not super convinced that going for a plugin architecture is the best move in all cases — it can be too much overhead for little benefit.

Polymorphism

Do you remember what Polymorphism is? Why is it useful?

Polymorphism is an important pillar of object-oriented programming. The word “polymorph” means “many forms”. Polymorphism allows us to treat objects as having one of multiple “forms”, and we don’t necessarily know until runtime what that form might be. (This should remind you of interfaces!)

What different kinds of polymorphism are available to us in Java?

  • Interfaces
    • default methods allow us to add shared method implementations to interfaces. These methods are inherited by any implementing subclasses that don’t implement their own versions; if default methods are added to interfaces that have been in use by many clients, this can lead to subtle issues at runtime, even if the client’s code compiles successfully.
  • Extending a concrete class: a class extends another concrete class. Both the super-class and the sub-class are concrete, initialisable classes. The subclass may inherit the superclass’s behaviour, modify it, or add to it.
  • Abstract classes: An abstract class can define concrete methods as well as abstract methods that must be implemented by a subclass.

EJ20: Prefer interfaces to abstract classes.

It used to be that interfaces were quite limited in what they could do, compared to abstract classes. Interfaces could only define abstract methods that all implementing subclasses had to implement. We’ve already talked about the benefits of this (see lecture notes on coupling and cohesion).

But this led to difficulties when, for example, an interface that was in use by many classes needed to be extended in some way. Any additions of abstract methods to the interface would require all implementing subclasses to also need implementations of the new abstract methods, even if the implementation was to be identical for all subclasses.

Compare this to abstract classes, which allow a mix of fully implemented methods as well as abstract methods. All subclasses must implement their own versions of abstract methods, but have the option to inherit the methods that are already implemented in the superclass.

Clearly, they seem more useful than interfaces!

Enter default methods

All of this changed with the introduction of default methods for interfaces. Default methods allow you provide implementations for certain behaviours in the interface itself, so that implementing classes can inherit them or override them.

As a result, using interfaces give you the following benefits:

  • Existing classes can easily be retrofitted to implement a new interface. It’s just a matter of declaring that the class implements the interface, and adding the required methods. Because classes in Java can implement multiple interfaces, this is not a problem. However, a Java class can extend at most one class. So it’s not straightforward at all to retrofit an existing class to extend an abstract class.

  • Interfaces let you create non-hierarchical type frameworks. Not all organisations lend themselves to tree structures. That is, you may want different combinations of types “mixed together” for specific subclasses. To achieve this flexibly with abstract classes, you would end up with a bloated class hierarchy, trying to create a separate type for each combination of functionality you want to support. With interfaces you have infinite flexibility to enhance class behaviours as needed.

  • It’s easy to enhance implementing subclasses behaviours by adding default methods

For example, consider the Comparable interface. In older versions of Java, the interface simply provided an abstract compare method that compared two objects. Implementing subclasses had to implement those methods. Now, the Comparator interface provides a number of useful default methods, which allow you to chain comparators together (using thenComparing) or to reverse the order of a comparison (using reversed).

No implementing classes needed to be aware of these additions to be able to benefit from them.

That said, there are risks with default method implementations. Default methods are “injected” into implementing subclasses without the knowledge or consent of the implementors. It is possible that the default method implementation that is being inherited by some implementor actually violates invariants that the implementor depends upon. good documentation is absolutely essential to communicate this information to implementors.

For example, a library maintainer who updated to Java 9 would suddenly have been saddled with a bunch of inherited behaviour in their classes that implement the Comparable interface.

It is simply not possible to write interfaces that maintain all invariants of every conceivable implementation.

EJ21 Design interfaces for posterity

The Collection interface contains the removeIf method. The method removes an element if it satisfies some boolean condition (a predicate).

Every class that implements the Collection interface (i.e., a whole ton of classes in the JDK) now inherits this removeIf method.

Unfortunately, this fails for the SynchronizedCollection, a collection object from Apache commons which synchronizes the collection based on a locking object. The default implementation of removeIf in the Collection interface doesn’t know about this locking mechanism. And the SynchronizedCollection cannot override the method and provide its own implementation because that would mutate the underlying collection, breaking its fundamental promise to synchronize on each function call. If a client were to call removeIf while another thread was modifying the collection, it would lead to a ConcurrentModificationException or some other undefined behaviour.

Liskov substitution principle

Proposed by Barbara Liskov, a pioneer of programming languages, object-oriented programming, and winner of the 2008 Turing award.

The LSP says:

Any class S can be used to replace a class B if and only if S is a subclass of B.

This is a good rule-of-thumb for using polymorphism currently.

The Liskov Substitution Principle says that in an OO program, if we substitute a superclass object reference with an object of any of its subclasses, the program should not break. This is in much the same way that code that uses a List type can be executed with an ArrayList or a LinkedList and everything works just fine.

You can think of the methods defined in a supertype as defining a contract. Every subtype (e.g., everything that claims to be a List) should stick to the contract.

The LSP helps us to ensure that invariants in the superclass are maintained in subclasses (i.e., preconditions and postconditions are satisfied). This can also help clients rely on extensions to our existing classes without fear of unexpected functional outcomes.

In a language like Java, the existence of the appropriate functions (e.g., methods with the right names, parameter lists, and return type) are more-or-less guaranteed by the language’s type system. For example, if you were you create a new List implementation, your code would not compile until you had implementations for all of the methods that are required by the List interface.

But the LSP goes beyond simply satisfying the type system. It’s a promise of semantically fulfilling the contract of the supertype. That is, the subtype should behave like the supertype (e.g., no matter what kind of list is being used, the effect of adding an item is the same).

For example, subclasses can improve the performance of the superclass:

  • a subclass can use a better search algorithm than the base class
  • a subclass can use a better sort algorithm than the base class
  • the expected behaviour and outcome should be the same

Currently, languages do not automatically enforce these properties.

Code Critique

public class Bird {
    public void fly() {
        System.out.println("Flying...");
    }

    public void eat() {
        System.out.println("Eating...");
    }
}

public class Crow extends Bird {}

public class Ostrich extends Bird {
    public void fly() {
        throw new UnsupportedOperationException();
    }
}
public class TestBird {
    public static void main(String[] args){
        List<Bird> birdList = new ArrayList<>();
        birdList.add(new Crow());
        birdList.add(new Ostrich());
        birdList.add(new Crow());
        letTheBirdsFly ( birdList );
    }

    public static void letTheBirdsFly (List<Bird> birdList ){
        for (Bird b : birdList) {
            b.fly();
        }
    }
}

What’s the problem with the code above?

Hint

The Ostrich extends the Bird superclass, but does not support all of the required behaviours. This is a clear violation of the LSP: the Ostrich has a more constrained set of functionality than its superclass, Bird. This happens because the Bird abstraction has too many responsibilities. It is responsible for too much functionality, so when the time comes to extend the software with the Ostrich class, we run into trouble.

How would you fix the design?

The composite design pattern

A design pattern is a general, re-usable solution to a commonly occurring problem within a given context in software design. They offer templates for how to solve problems that can be used in multiple different solutions.

We’ll do a more general introduction to Design patterns on Tuesday. But for now I would like to introduce the Composite design pattern.

Today, we’re going to talk about the Composite design pattern.

The composite design pattern makes sense when a portion of your application can be structured as a tree.

For example, suppose you need to “read” all the files in a computer. You have a root folder (the root of your directory tree). That root may have many children (files or folders inside of it). Some of those children may in turn have further children.

The Composite pattern involves you treating the entire structure as a tree (much like your file system does). Then each “node” of the tree might have a read operation. For FileNodes, the read operation simply prints out the contents of the file. For FolderNodes, the read operation involves further traversing its children and reading them. This recursively continues until there are no more files to be read.

In the diagram below, reading FolderNodes involves reading all the nodes contained within the folder, which may be FileNodes, or they may themselves be FolderNodes that contain further children.

flowchart TD

home["fa:fa-folder-open Home"] --> music[fa:fa-folder-open Music]
home --> movies["fa:fa-folder-open Movies"]
home --> csc305[fa:fa-folder-open CSC 305 Assignments]
csc305 --> lab1[fa:fa-file Lab 1]
csc305 --> lab2[fa:fa-file Lab 2]
movies --> animated[fa:fa-folder-open Animated] 
animated --> httyd[fa:fa-file How To Train Your Dragon]
music --> beatles[fa:fa-folder-open The Beatles]
beatles --> hcs[fa:fa-file Here Comes the Sun]
beatles --> lib[fa:fa-file Let It Be]

Another example that came up in class was reading the Document Object Model (DOM) (i.e., HTML files). An HTML file is made up of an <html> element, which holds child elements like <head>, <title>, and the <body>. The <body> element holds further children (<div> elements, <p> elements, etc.) that make up the contents you see in the webpage. Each element gets “rendered” in the browser according to certain rules (e.g., the <div> is a “block” element: it takes up the full width available, so that new <div>s would appear on a new line; whereas the <span> is an “inline” element: it can appear within another block).

Benefits of this pattern

  • Using polymorphism and recursion, you can work with quite complex tree structures. For example, in the file system example each folder doesn’t need to know if its children are files or folders; they can simply be read, because they both belong to some supertype.
  • You can introduce new types of “nodes” in this tree conveniently, and the rest of the structure doesn’t need to change. For example, consider that our filesystem has a new kind of file (say, that needs to be decrypted before it can be read). You can simply create a new subclass EncyrptedFileNode and implement the new read method so that it gets decrypted as part of the read operation.

Drawbacks

  • A commonly cited drawback of this pattern is:

It might be difficult to provide a common interface for classes whose functionality differs too much. In certain scenarios, you’d need to overgeneralize the component interface, making it harder to comprehend. (source)

Personally, I think the above would be an indication that you shouldn’t be using the Composite design pattern in the first place.