IEEE Software - Architectural Hoisting

This column was published in IEEE Software, The Pragmatic Architect column, July-Aug 2014, Vol 31, number 4.

Abstract: Architectural hoisting is a design technique where the responsibility for an intentional design constraint (that is, a guiderail) is moved away from developer vigilance into code, with the goal of achieving a global property on the system.

The history of engineering shows us that pioneers have built working machines before fully understanding the principles that enable those machines to work. In recent years, research into software architecture has helped us understand how and why some designs work. This affords an opportunity to reflect on existing software systems in a new light, rather like analyzing old race cars in a modern aerodynamic simulator.

In this column, I’ll make a connection between the challenges of programming “in the small” and programming “in the large” what we would today call software architecture. I’ll describe an architectural design technique called architectural hoisting, a term first coined at NASA Joint Propulsion Laboratory. So let’s dust off a couple established technologies—servlets and Enterprise Java Beans (EJB)—and see how they work.

Servlets versus EJB

Both servlets and EJB are server-side Java technologies that employ a runtime container that uses threads for processing Web requests (usually HTTP). However, the code that handles requests (servlets or beans, respectively) treats threads differently. The servlet container requires developers to write servlets that are reentrant because many threads can be active in the same servlet simultaneously. (Reentrancy is the default behavior, but servlet containers can be configured to use single-threaded servlets.) In contrast, the EJB container creates new instances of single-threaded beans on demand so that developers need not write reentrant code.

Consequently, a servlet container can be simpler and faster because it can avoid the logic and runtime overhead of managing a pool of instances, but it also places a burden on developers to understand reentrancy and be vigilant in avoiding race conditions. In contrast, an EJB container doesn’t require developers to be vigilant, but at the expense of runtime cost and the complexity associated with managing a pool of instances.

Developers can choose between two paths to the same destination: servlets with vigilance or beans with runtime complexity. Regardless of the path chosen, no developer wants race conditions in the code. By scrutinizing these two paths, we can see a common theme in their intent and the use of constraints to achieve it. The intent of both is to achieve a global property: safe handling of concurrent requests. And both constrain the code to achieve the global property, either via a set of rules to vigilantly follow (servlets) or a runtime system that enforces the global property (beans).

In this example, we can say that EJB hoists the concurrency concern because it helps developers achieve the global property (safe concurrency) without developer vigilance. Before defining hoisting more precisely, let’s look at why developers choose to apply constraints to their code, either in the small or in the large.

The Role of Constraints

Software is complex, often devilishly so. If a system does what it’s intended to do, it’s because developers have reasoned through how the code works and avoided ways it could go wrong. To make reasoning easier, developers impose constraints to shrink the design space and thus reduce what they must reason through. When programming in the small, such as implementing a data structure, developers are taught to impose invariants and ensure that every operation that mutates the data structure preserves the invariants. As we all remember from our first experiences of implementing data structures, ensuring that invariants aren’t violated is difficult and requires vigilance.

When programming in the large, developers still need to reason about the code, so they impose constraints, just at an architectural scale. These architectural constraints might be to use a three-tier architecture, to always check the cache before hitting the database, or to use idempotent operations across distributed worker nodes. These architecture-scale constraints serve the same purpose as invariants in data structures: to shrink the design space so that reasoning is easier.

I like to call self-imposed architectural constraints guiderails because a constraint is an obstacle to overcome while a guiderail is a design choice deliberately placed to point the system in the right direction. If code must run on some obsolete hardware, that’s a constraint that makes life harder. In contrast, roller coasters have guiderails because they should be fun-scary, not dangerous-scary.

As the size of the code gets larger, it’s increasingly difficult to enforce guiderails. Large codebases provide lots of opportunities to break the guiderails and lots of developers to break them. When you’re programming in the large (that is, developing your software architecture), you have choices about how to enforce your guiderails. One option is to insist that all developers follow rules and remain vigilant. I’ll refer to this strategy simply as vigilance, and it works great in the small.

But vigilance works poorly when programming in the large because painstaking attention doesn’t scale up. You only need to look at the results of the yearly Pwn2Own.com contest or at the [top security vulnerabilities list] (http://cwe.mitre.org/top25) to see that professional programmers trying their best still have trouble following all the rules, all the time, across large codebases. We simply hit our cognitive limits and can’t reliably reason through that much complexity.

Architectural Hoisting

An alternative to vigilance is architectural hoisting, a design technique where the responsibility for an intentional design constraint (that is, a guiderail) is moved away from developer vigilance into code, with the goal of achieving a global property on the system.

When hoisting a global property such as security, performance, or scalability, the responsibility of achieving the global property is assigned to the architecture rather than to developers. But what does “assigned to the architecture” mean? Usually, it means building infrastructure code (for example, an application container, an event bus, or a garbage collector) that enforces the guiderail and reduces the need for developer vigilance. Architectural hoisting can be seen as a strategy to separate concerns. However, while most strategies to separate concerns focus on remodularizing the source code, hoisting often involves shifting compile-time concerns to runtime concerns, as we saw in the servlets/beans example.

An essential characteristic of hoisting is that it decreases or eliminates the need for developer vigilance. The most vigilant developers will sometimes make mistakes, but we can trust computers to always follow the rules. Choosing between vigilance and hoisting can be hard, however, because building that infrastructure code to hoist the property can be difficult, expensive, and rigid.

Hoisting Mechanisms

So far, I’ve presented a stark contrast between vigilance and hoisting, with vigilance requiring lots of developer attention and hoisting requiring none. In practice, a variety of mechanisms can hoist a property and reduce the need for developer vigilance by varying amounts.

The interface to a library or service can enforce a guiderail. An example of programming in the small is when the interface to a hash table enforces the invariant that keys always have values by providing just one method: add(key,value).

When programming in the large, simply providing a library might be ineffective. For example, Netflix began using Chaos Monkey, a runtime system that kills off random nodes in its distributed system to reveal components unready to handle such failures, in part because it was difficult to ensure that developers were vigilantly using the provided libraries that implemented request-rate-backoff (as explained in Sid Anand’s QCon SF 2011 presentation).

Trade-offs

As you’ve seen, hoisting can be a powerful tool for ensuring that a global property holds, but it comes with trade-offs you must evaluate. One frequent trade-off is that the implementation of the hoisted mechanism is difficult to build and debug. For example, building your own EJB container, tuning it for performance, and debugging it is considerable work.

There’s usually a downside to providing a single, universal hoisted mechanism. Yes, hoisting ensures that the guiderails you choose are in place, but you may accidentally over-constrain developers and provide no way to bend the rules.

Conclusion

It’s an intellectual joy to discover that two seemingly different topics actually have connections and similarities between them. Programming in the small feels very different from programming in the large, yet in both we can see developers applying a similar technique to reduce the design space and enable reasoning: invariants in the small and guiderails in the large. Yet the preferred mechanism for enforcing invariants in the small—vigilance—is a poor choice for enforcing guiderails in the large.

Much of how developers successfully design software remains a mystery, with some developers better at it than others, and without any clear path to improve their skills. Once we understand architectural hoisting and guiderails, we can more clearly see the design options open to us, ranging from pure developer vigilance to strong hoisting mechanisms.

As software development increasingly resembles other engineering disciplines, reflection on our designs isn’t just a curiosity but an obligation. Building successful systems isn’t enough. As engineers, we must understand the principles that allow our programs to work and teach them to the next generation of software developers, providing them with our best and most condensed understanding of software design.

References

  1. G. Fairbanks, Just Enough Software Architecture, Marshall & Brainerd, 2010.