IEEE Software - The Pragmatic Designer: Scale Your Team Horizontally

Jul 2, 2019 | George Fairbanks

This column was published in IEEE Software, The Pragmatic Designer column, July-August 2019, Vol 36, number 4.

ABSTRACT: We’d like to add an engineer to our project and have our team get that much more power. The primary factor that allows developers to contribute is the state of the code. To keep the code clean, we must shoot for two goals to minimize a project’s ur-technical debt. First, enable developers to contribute according to their ability, not according to their tenure. Second, keep the design small enough to fit in everyone’s heads. This doesn’t contradict Brooks’ Law, which applies specifically to late projects.

Pre-publication draft. Please click this official link so your view counts in the IEEE's records of article views – plus the IEEE site has profesionally typeset PDFs.

Not so long ago, when your company became successful, you bought a bigger computer to run your software. We called this scaling vertically. Today, that is less common, in part because we have gotten quite good at scaling horizontally, so when your company becomes successful it buys more of the same size computers.

This article discusses a related idea, that we should scale our teams horizontally instead of vertically. When we add another computer to our data center, we get that much more power. We’d like to add an engineer to our project and have our team get that much more power. And when our team inevitably loses an engineer, it should not be a crisis.

Astute readers will already be wondering about Brooks’ Law, which says that adding people to a late project makes it later. How can that be true while we scale horizontally? We’ll return to that after describing how some of the best teams organize themselves. Even if we can’t have true horizontal scaling, most teams can scale significantly better through improvements in their coding and planning practices.

Scaling teams vertically or horizontally

I will never forget a conference call I was on several years ago (and not working for my current employer). It was a standard call for me, in a conference room at the office in New York, but a teammate had also joined the call by cell phone while at a wedding in Italy with his family. The fact that he had to be on the call was a vivid sign that in organizing the team and its processes we had done something horribly wrong.

He was on that call because nobody else could be. Perhaps you’ve been that person too. In a way it makes you feel good that you are so critical to the success of the team. But you’d really like to disconnect from your job and fully enjoy the wedding. Why couldn’t someone else have helped out?

The answer is: because there were things in his head that weren’t in teammates’ heads. When the critical information is locked in one head, scaling horizontally by adding another person doesn’t work. In the general case that’s inevitable. However, most of the time the silos of information are self-inflicted by commonplace but harmful software development practices that we could change.

Ask yourself how long it takes for a for a smart, experienced developer to participate as an equal on your team. If you want your team to scale horizontally, people must be able to ramp up quickly and contribute according to their ability, not according to their tenure with the codebase.

So, how can we do this? The primary factor that allows developers to contribute is the state of the code. If you walk up to a squeaky-clean codebase whose authors have made an effort to get what they know out of their heads and into the code, you’ll be able to understand it and contribute quickly. In contrast, when you walk up to a mess of a codebase, or even clean code where critical information is still in the heads of developers, you will struggle to contribute.

Everyone wants clean code but they throw up any number of excuses as to why they cannot do it: it’s too expensive, it slows feature velocity, or why bother because the code cannot express all of the design. I don’t have all the answers yet, but I can see that we can improve horizontal scaling through improvements in two areas: coding practices and planning practices.

Better coding practices

Twenty years ago, I was lucky to be part of the Catalysis team [1] that showed people how to combine object-oriented programming, software architecture, and precise specifications. We had a hard time convincing developers to change their practices. But fast forward to today and developers are excited about functional programming that in fact requires more discipline and precision than we dared to advocate.

What has changed is the overall environment in which we code, including the languages, tools, and conventions. It’s become more fertile ground for the same ideas to take hold. Developers today can take the following for granted:

A visible, richly connected graph of source code and contracts to explore just by clicking
Languages and rich type systems that make it easy to write self-documenting code
README files next to the code with text and images that explain what the code cannot (especially the intensional aspects of design)

None of these were true twenty years ago, let alone fifty when Fred Brooks was gathering tales for his book. Today, it’s much easier for your team to scale horizontally when your codebase is kept squeaky clean, the design is simple and consistent, and you can infer the design by reading the code. We need to do more to make the code squeaky clean, including using the following:

Clear, consistent concepts that tie the domain, design, and code
A hierarchical story at many levels
An architecturally-evident coding style
Consistently applied architecture and design patterns

Let’s say you do all of that, which doesn’t seem impossible, and the code is squeaky clean. How do you keep it clean?

We must add team processes with two goals in mind. The first is to allow developers to contribute according to their ability, not according to their tenure. Collective code ownership, collective design ownership, and agreed-upon quality standards all help. The second goal is to keep the design small enough to fit in everyone’s heads. Decide on an architecture and a set of design patterns for finer-scale decisions. Add in relentless refactoring, attention to non-feature needs (via a multi-color backlog), curation of design abstractions, and refactoring to consistency. These processes should keep the code in good shape and avoiding a hodgepodge of designs and contradictions that somehow manages to meet all the user stories.

To me, this is all in the service of minimizing technical debt. As Ward Cunningham said, “[W]e accumulated the learnings … about the application over time by modifying the program to look as if we had known what we were doing all along and to look as if it had been easy to do…” [2] Many people use “technical debt” as a synonym for “ugly code” but I like the way he originally described it as what happens when you allow your code to deteriorate over time, no longer expressing what you know to be true about the domain or your best ideas about design.

If you are an advocate of Extreme Programming (XP), Domain Driven Design (DDD), or software architecture, you will recognize many of your favorite ideas plus some others that I’ve seen work well.

Better planning practices

The second way teams can improve horizontal scalability is through different ways of planning. Common pitfalls are a single planner (or architect), fragile plans, and vague plans. In the 1990’s, it was common to see an architect create a big design and allocate chunks of the design across a team of developers. Not everyone on the team was ready to revise that design if the architect was away, and the design was often overspecific in some areas while vague or even disastrously optimistic in others. Although you feel like you are at the eye of the hurricane as the architect and have great influence, the whole team suffers when you are away, so you end up on your cell phone from that wedding in Italy.

The good news is that we don’t seem to fail that way anymore. Often the person planning is not technical and the plan consists of externally visible behaviors (e.g., use cases), so there’s no danger of being too technically rigid anymore, and planning happens in responsive weekly or biweekly cycles.

With that freedom comes trouble. When all the developers are empowered to write whatever code necessary to achieve the externally visible behavior and there’s no architect to guide the technical practices, it’s easy for complexity to build up because efforts aren’t coordinated technically. That accidental complexity overwhelms us and prevents horizontal scalability.

A pattern I’ve seen work well is to have an architect (in deeds if not in name) who sets up the initial project structure and patterns, then intervenes as needed to point the project in the right direction. Today’s tools, languages, and frameworks are powerful and often just making a reasonable choice early on is sufficient and avoids arguments later on. Your understanding of the domain is always evolving, though, so I’d have your architect keeping an eye on that too.

Teams can scale horizontally by making everyone a little bit aware of the big picture so that they will individually make pretty good decisions. What exactly is the big picture? Philippe Kruchten suggests that we put four kinds of work into our plans: features, defects, architectural infrastructure, and technical debt [3]. By making these visible to the whole team and publicly wrestling with the inherent tradeoffs between scheduling work on each, the plan is no longer on tablets chiseled in secret. So when there’s an obstacle, every developer has some idea of the overall goals, immediate goals, and tradeoffs.

I’ve had positive experiences with planning this way. Not everyone on the team will be great at this but they can do pretty well and get better over time. The best part was that things proceeded smoothly when I was away. I’d like to think that over the long term I had a positive influence through steering but the team was able to steer itself just fine over the short term.

Conclusion

Looking back at the transition from vertical to horizontal scaling, we initially thought that scaling hardware (or teams) horizontally was prohibitively expensive, but as techniques were invented the costs came down, just as the costs of refactoring are lower because of tool support and costs of strong contracts are lower because of functional programming improvements to our programming languages.

I’m not ready to argue against Brooks’ Law that adding people to a late project makes it later. But today, when developers are working on a clean codebase, I see lots of work happening in parallel with tool support to facilitate coordination. When things are going smoothly, it’s because the architecture is largely set, the design patterns provide guidance for most issues that arise, and the code itself (with README files alongside) allow developers to answer their own questions.

Brooks was worried about coordination cost: “Since software construction is inherently a systems effort — an exercise in complex interrelationships — communication effort is great, and it quickly dominates the decrease in individual task time brought about by partitioning. Adding more men then lengthens, not shortens, the schedule.” [4]

Notice that he never said to use just one person on a project to eliminate coordination cost. In which case, if we’re able through tools and practices to keep coordination costs low, we’ve enabled horizontal scaling. Teams that follow processes like I’ve outlined above can enable developers to contribute according to their ability, not according to their tenure on the project.

All of the technical practices and software processes mentioned in this article are already being used, though not by most teams. Not many teams do a great job of scaling horizontally yet, but some scale much better than others. Though it’s hard to find the signal through the noise, I think we are seeing an important shift in how teams organize themselves and build software.

Many thanks to Timothy James Halloran for his help in developing these ideas.