by Matt Heusser

Are you over-testing your software?

Jul 07, 2015
Agile Development

Is it possible to reduce – or even eliminate – the human factor when it comes to testing software releases? In a word, yes. Here’s how.

Release candidate testing takes too long. 

For many agile teams, this is the single biggest challenge. Legacy applications start with a test window longer than the sprint. This happens over and over again with clients and colleagues working on large, integrated websites and applications. 

But what if you didn’t need a human to look at a specific, numbered build to manage risk before deploy? Instead, what if a bot told the team the build was ready, and all someone had to do was click the deploy button? 

Getting there would take some infrastructure and discipline. It might not be possible for you, but there are organizations that do this every day. 

That is exactly what the technical staff does at, a sort of online mall for crafters that had over $1  billion in sales in 2013. Etsy had about a hundred programmers on staff in 2011. It has grown quite a bit since. New programmers at Etsy go through a process on the first day to learn the pipeline, which includes making changes, running the checks and a production deploy. Etsy isn’t alone; many companies pursue a model of continuous delivery. 

The secrets to eliminating regression testing 

Companies that have eliminated – or at least dramatically reduced – release candidate testing have a few things in common: 

Test tooling. Many tools exist that can exercise the software, running through basic flows and looking for key conditions. This can range from unit tests or testing a Web API all the way to end-to-end exercising of the GUI. One thing to look out for: Over time, end-to-end tests tend to take so long as to become unwieldy. Picking the right automated checks to put in place is critical; the goal is to have just enough to reduce risk, run fast and diminish the maintenance burden that comes with keeping the checks up to date. 

[Related: How peer review leads to quality code] 

Hook tools into the build system. Waiting for a build and then running checks by hand at a desk adds wait-time to the process. Get this to happen automatically with every build, ideally an automated checkout/build/deploy/test/promote staging pipeline. 

Radically reduce regression error. Continuous integration that find continuous problems will lead to continuous fixing and retesting. In order for these strategies to work, the code that comes out of continuous integration needs to fall back – or regress – much less often than industry standard. With excellent release candidate testing in place, it doesn’t. Too often the safety net enables bad work. Remember, eliminating release candidate testing means improving development practices. 

Develop separately deployable components. It’s true: Programmers at Etsy deploy to production on day one. The code they write, however, is a very simple change to add their name and image to the About Us > Our Team page. That’s it. The change doesn’t touch the database, the web services, the style sheets, any code libraries or production code; it’s limited to a regular HTML file and an image. The programmer gets specific directions, so the worst that can happen is that the page looks wrong for a few minutes. 

With components, each change can be isolated. Instead of one single executable file, the pages at Etsy are in separate files. A change “push” is just a few files at a time and doesn’t require a server restart. This reduces risk while making deploys (and rollbacks) much more manageable. 

Separate test/deploy strategies by risk. Updating a simple web page is one thing – but what about services that require such personal information as credit cards, emails and passwords? Each of these might require a different test strategy or process. As reported three years ago, Zappos (a division of Amazon), separates the code that’s regulated from the rest. Changes that impact the regulated systems go through a more stringent process that is released less often, with, yes, more formal checks. 

Continuously monitor production. The damage buggy code makes in production is a multiple of how bad the bug is multiplied by the amount of time the bug stays in production. If the team can find and fix the defect fast, the risk is much lower. One key to finding the problems is seeing errors. For Web applications, this includes 500 errors, 404s (redirects to nowhere), crashes and other defects – and all of that can be graphed and visualized with tools. 

Here’s Web developer/scientist Noah Sussman – who designed Etsy’s Continuous Integration (CI) System – explaining the monitoring system at Etsy:

Automatic deploy and rollback. Monitoring production to find bugs is great; fixing it with the touch of a few keystrokes – or a web-based app – is even better. 

(Bonus) Configuration flags. Instead of a patch or manual rollback, it may be possible to turn the feature off with a web-based app. All the programmer has to do is wrap the feature in an “if ()” statement that ties back to a code library. Change the config flag to “Off” and the new feature disappears. Sussman’s article Config Flags: A Love Story

[Related: Why agile skills are more valuable than certifications] 

(Bonus) Incremental rollout. Imagine config flags that are not global, but instead tied to the user. Users who want risk – employees, their friends/family and known early adopters – get to see the feature. Free and trial users see the feature when that config flag is flipped, and so on. The general theme is that more conservative users, the ones who use the software to run their business, see the most stable, well-tested version of the system. In their book How We Test Software at Microsoft, Alan Page and his co-authors refer to this as “testing in production.” Instead of config flags, their team has different versions of the application running on different servers, and migrates processes to the right server based on user type. 

How long regression cycles are born 

Software developer Abby Bangser was describing a project she recently worked on that build the capability to deploy continuously. For business reasons, the team wanted to deploy every iteration, which is fine. At the end of the iteration, one of the leaders asked Bangser to do a “five minute check,” just to make sure everything was fine. 

She refused. Why? Because if managers are asking developers to spend five minutes exploring, it’s because they don’t have enough confidence in the system as defined – either the quality of the code, the tooling, the ability to notice problems or, perhaps, the ability to roll back. Bangser wanted that confidence. 

Why? Why is five minutes a big deal?

Because that’s how these legacy applications ended up with release-test windows that are a month long: They started with five minutes, and grew five minutes at a time. 

Another look at cadence 

Companies that have a large release-candidate test process got that way for a reason, and it is likely the reason included the reason the company still survives. Release candidate testing brought it success. Dropping it seems like foolishness. 

In some cases, it might be. If you switch out the Web server, integrate the entire login system with Google’s user IDs, or any other sweeping change, you might batch the work up for months and hide it behind config flags, and even do some candidate testing before release. Even if you’re an Etsy, Twitter, IMVU or another media darling. 

They key is to do just the right amount of release candidate testing, to trade risk for time in the smartest way possible. That might mean flexing the cadence up or down based on risk. 

Ask yourself the tough questions: How long does the process take now? What is your cadence? Are you subtracting five minutes from that every two weeks – or adding? And what are you going to do about it?