IEEE Software - The Pragmatic Designer: Testing Numbs Us to Our Loss of Intellectual Control

Feb 14, 2020 | George Fairbanks

This column was published in IEEE Software, The Pragmatic Designer column, May-June 2020, Vol 37, number 3.

ABSTRACT: Software teams need a healthy balance of both intellectual control, which comes from reasoning, and statistical control, which comes from testing. Complexity is the enemy of reasoning; efforts to maintain intellectual control tend to push complexity down. In my experience, many teams let their intellectual control atrophy, then compensate with more testing. This approach works for a while, but without intellectual control to keep complexity down, progress becomes slower and more difficult. Once lost, intellectual control is expensive to recover, so the teams find themselves in a local maximum they cannot escape.

Pre-publication draft. Please click this official link so your view counts in the IEEE's records of article views – plus the IEEE site has profesionally typeset PDFs.

I’ve been reflecting on how we used to develop software in the 1990s compared to what I see today. One thing that stands out is that everyone is testing. Testing, testing, testing. What a difference! Having those tests gives us confidence that evolution of the code is not breaking anything. I don’t know what I would do if my project’s tests were lost.

So, how did we ever get code out the door before we had automated regression testing? I think the answer has many parts, and it includes ideas that we should not resurrect, including: individual ownership of modules, quarterly product releases (or worse), whole-program specifications, and lots of manual testing. There is one idea that we should return to: We can keep the code under control by reasoning through its design. This is not an idea that we collectively weighed and rejected, but instead one that seems to have been gradually neglected.

It seems that today, with all of our tests, we allow the code to grow woolly and complicated in ways that, in the past, with no automated regression tests, we never could have tolerated. We were forced to keep the code simple because if we didn’t, the complexity of the special cases wouldn’t fit in our heads.

This leads me to the following idea, which I will dig into in the rest of this article: Software teams need a healthy balance of both intellectual control, which comes from reasoning, and statistical control, which comes from testing.

Complexity is the enemy of reasoning; efforts to maintain intellectual control tend to push complexity down. In my experience, many teams let their intellectual control atrophy, then compensate with more testing. This approach works for a while, but without intellectual control to keep complexity down, progress becomes slower and more difficult. Once lost, intellectual control is expensive to recover, so the teams find themselves in a local maximum they cannot escape.

Intellectual and Statistical Control

What do the terms intellectual control and statistical control mean? Edsger Dijkstra wrote repeatedly about the need for us to keep what he called intellectual control over our software, but to my knowledge he never defined the term [1]. It’s clear that he thought a mathematical proof demonstrated intellectual control, but it’s not clear what else would rise to his standard.

We all know that control isn’t binary. I do a better job today of keeping a car under control than when I was a teenager, but I recognize that a pro driver has even more control. Similarly, we can have more or less intellectual control over our software.

Here’s a tiny example. Consider three methods that I claim do the same thing (that is, they meet the same specification). The first has a proof that it meets its specification. The second has a simple implementation with a base case and an inductive case, but no proof. The third has a complicated implementation with a FOR loop and incrementing counters. As I try to reason about what each of these does, I notice a gradient of intellectual control. The second method is structured so that I can easily convince myself of its correctness, while the third requires me to perform a lot of complicated reasoning to convince myself how it behaves in corner cases.

The idea that I have more or less intellectual control also applies at larger scales. One of the most important ideas of software architecture is that I can gain intellectual control over large amounts of software by constraining it to follow rules. Those rules let me make sweeping conclusions about the code without reading every line of it. For example, if I design the system such that the decision making is in one module and the action takes place in another module, I can reason about each part independently. I can gain even more confidence, and therefore more intellectual control, if my language or static analysis enforces or checks my intent.

Intellectual control gives us the ability to reason about how the software works. I can have more or less of it depending on conditions such as the existence of a proof, the way my code is structured, or the tooling that can validate characteristics of the code.

Intellectual control takes many forms. Anyone who has designed software can provide examples of their own threads of reasoning that flow through the code. Perhaps the thread is that things produced here are consumed there, or that only reviewed code is running in production, or that an implementation matches an abstraction. More vividly, we can recognize when those threads are absent. We can recognize Dijkstra’s notion of intellectual control even as it takes many forms.

No engineer should rely on reason alone. As Donald Knuth famously said, “Beware of bugs in the above code; I have only proved it correct, not tried it.” [2] That leads us to the idea of statistical control. Statistical control comes from running the code and seeing it behave as expected. Except for very simple programs, we cannot test every possible case, so we try some percentage of the cases and hope that the cases we didn’t try are also working. We end up with a statistical confidence of its correctness based on how many cases we’ve looked at compared to how many exist.

Reasoning keeps complexity low

In my experience, teams that rely on their ability to reason about their code will also keep complexity low. It seems to play out like this: Teams can choose how much intellectual or statistical control they have over their code. Day by day, a team’s efforts to keep intellectual control cause them to make design decisions that favor low complexity.

That observation rests on three assumptions. First, complexity builds up over time, arising in both the problem and solution. Second, our minds are limited, so it’s easy for complexity to exceed our ability to reason about it. And third, there are many possible designs for the same problem, with varying complexity.

When developers set out to keep software under intellectual control, they must stay within their complexity budget – the amount of complexity that their limited minds can handle. They scrutinize new complexity as it arises, seek out designs that keep it low, and revisit existing code to make it simpler or more consistent. At each step on their journey, they apply software design techniques that help them keep control, such as: abstract data types, information hiding, state-based analysis, separation of concerns, and consistency via patterns ranging from programming idioms through architectural styles. Each of these helps them keep complexity down and maintain their ability to reason about the code.

Complexity also comes from the problem domain. There is no natural simplicity in regulations, business requirements, and backwards compatibility. If we accept the requirements one by one and translate them into code, complexity quickly takes over. If we instead engage in a dialogue with stakeholders to build up an understanding – a theory – of how this problem domain works, then each requirement becomes a data point that either fits our theory or demands that we revise it. A theory is compact, generalizes the requirements seen so far, and keeps complexity down.

This approach was common in the object-oriented community in the 1990s and was called object-oriented analysis. It lives on today in the domain-driven design community, which has generalized the approach beyond objects. Today, teams do less of this kind of analysis. When they do, it’s typically quite limited or commingled with programming itself.

So, the effort to keep intellectual control has the effect of applying constant downward pressure on complexity, simplifying both the code and the problem.

Is refactoring enough?

Refactoring has long been hailed as the antidote to complexity buildup on agile projects. It works wonders on small amounts of code, because a single developer can keep the details in mind and make a refactoring in an hour or a day. Large changes are harder or impossible. While we can conceive of such changes, there are too many details to keep in mind, so we must actually write the code to have confidence that our thoughts are right. (Note that this perfectly echoes the reasoning vs. testing discussion above). Worse, these changes can take weeks or months, so it’s not an effort that can be swept under the rug of normal day-to-day software development.

Empirically, refactoring seems to be too weak of a force once the code becomes large. Even teams that have a strong culture of refactoring accumulate large amounts of technical debt. A few decades ago, it was a plausible argument that teams that regularly refactored could keep the code in good shape indefinitely. Today, we have plenty of teams writing software in quick iterations with only lightweight planning, but doing lots of refactoring, yet we don’t have many success stories of keeping complexity and tech debt low.

Refactoring’s biggest limitation is that, by definition, it preserves existing behavior. It is somewhat mechanical and can be done by a skilled developer who knows nothing about the problem this code is addressing. If I made a mistake about the desired behavior when I first designed a module, then refactoring isn’t a suitable technique: I need redesign. Redesign requires the developer to understand the problem being solved. That understanding is often lost. As the code becomes bigger and older, teams can lose track of exactly why the code was designed that way instead of alternatives. When teams keep intellectual control, they have a much easier time redesigning because they preserve their design thinking.

Hard to recover

Few teams are thinking about how much intellectual control they want, or how much they are leaning on statistical control from tests. Early in a project, a team always has intellectual control over the code, if only because there’s so little code that they can remember it all. Later in the lifecycle of a project, the team doesn’t have intellectual control. Yet no team can point to the week of the project and say, “that is the week we decided to drop intellectual control.” It slips away without our noticing.

Some teams never had an intention to keep it. They believe that a few core principles will keep their project healthy, such as continuous integration, regression tests, refactoring, and daily standup meetings. There are a lot of voices on the internet saying that such simplicity is all they need. That simple advice is in contrast to processes like Extreme Programming that promote intellectual control through values, discipline, and practices like doing pair programming and having a system metaphor.

My past efforts to create order from disorder have shown that once a team loses intellectual control, it’s difficult and expensive to recover. Let me focus on two reasons. First, unless the code is trivial, having intellectual control means having an idea in your head that is simpler than the code. Perhaps that idea is a state machine, an abstract data type, or a module decomposition that separates concerns. When intellectual control lapses, the only thing to reason about is the software as written, with all of its quirks, which is staggeringly more complicated than, say, a state machine. Once a team has been evolving code without respecting an original abstraction, there is no easy way to reverse the entropy that has crept in.

Second, when you come up with an idea about how to solve a problem, that idea in your head is a creative act linking problem to solution. Recovering intellectual control means recovering that creative act, the aha moment behind why this-solves-that, but this time in the much messier context of an existing implementation. You must reverse-engineer the existing code, understanding not just what it does, but why it does it.

Every gardener knows that it’s much easier to consistently do a little bit of weeding than it is to recover once weeds have taken over. When intellectual control lapses, it’s like weeds taking over the garden, except the effort to reverse course is far higher. The result is that the loss of intellectual control is usually a one-way street because few teams can afford the time and expense to recover it. Once that happens, developers reason starting with the code, test more thoroughly, and petition management to rewrite the system.

Keeping the balance

A healthy balance of reasoning and testing is far better than testing alone. One of the best ways to keep intellectual control is, ironically, to strengthen our tests. Most teams just write simple tests of inputs and expected outputs. Adding more tests like these does nothing to guard our intellectual control. We need more sophisticated tests, such as property-based tests or model-based tests, that ensure that our abstractions do not erode over time.

In property-based tests, we state a property that we think is valid about our code and use some testing infrastructure to look for violations. For example, if we expect our linked list should never have cycles, then we can state this property and ask the testing infrastructure to look for counter-examples.

In model-based tests, we provide a simple reference model that we think works the same way as our real implementation and again ask the testing infrastructure to find counter-examples. The model implementation might store data in memory, while the real implementation uses a remote datastore. The real one offers better scale and durability when the power goes off, but in other respects should work exactly as the model does.

Code that is complicated usually starts simple. If property or model tests are added from the start, there is strong social pressure to keep those tests passing as the code evolves, which is pressure to maintain the abstractions that are present. Developers keep intellectual control because they can reason about the code by thinking about the properties and simplified models, rather than the full implementation.

I’ve had some success at using team culture to maintain a balance of intellectual and statistical control. Culture is fragile, however, so everyone on the team must be on board, otherwise it’s hard to avoid the slippery slope of substituting testing for intellectual control “just this once”.

Classic, not old fashioned

Imagine you are the CTO of a company and can watch ten teams with a variety of management and software engineering practices. Over time, you’d like to believe that you could tell which practices were good for the company. The problem with simple testing is that it obscures the signal, making it hard to tell that the practices are lousy until it’s an overcomplicated mess. Even the teams themselves might think things are fine until it’s too late, with the first warnings being slow progress and people leaving the team.

In the past few decades, our profession has enthusiastically adopted testing. Quality has improved and no one would suggest going back to the old days. Tests provide us a kind of control over our software (statistical control) based on how much of the state space they cover. This is a good thing and helps ensure that what worked yesterday continues working.

Over-reliance on simple tests is dangerous. The statistical control they offer can lead us to neglect intellectual control, or even fool ourselves into thinking that it’s old fashioned and unnecessary. I’ve seen many teams suffer because they let their abstractions erode. The ones with good tests kept insisting that “everything is fine” long after their train was off the rails. The tests were numbing their senses that would otherwise be alerting them of the trouble.

What teams need is a healthy balance of reasoning and testing. Developers want to reason through a proposed change in their heads and be pretty sure it will work before they write the code, using tests to check that reasoning. Without that balance, developers become hesitant to change code that is working but poorly understood, and hesitant to change a test that they suspect is wrong. When our reasoning is degraded, the only way to have confidence in a proposed change is to implement it and see if the existing tests still pass. That’s not a recipe for business agility or developer joy.