by Paul Rubens

Why Software Testing Can’t Save You From IT Disasters

Mar 12, 20146 mins
Agile DevelopmentEclipse FoundationIT Strategy

Some software bugs are like the cicadia, emerging only under the 'right' conditions and wreaking havoc until they're stopped. Without the right tools, no amount of software testing can stop these bugs from causing a meltdown. Just ask Nasdaq.

On the day of Facebook’s IPO, a concurrency bug that lay hidden in the code used by Nasdaq suddenly reared its ugly head. A race condition prevented the delivery of order confirmations, so those orders were re-submitted repeatedly.

UBS, which backed the Facebook IPO, reportedly lost $350 million. The bug cost Nasdaq $10 million in SEC fines and more than $40 million in compensations claims — not to mention immeasurable reputational damage.

So why was this bug not discovered during testing? In fact, how did it never manifest itself at all before that fateful day in 2012?

Race Conditions Present Concurrency Time Bomb

The answer is that some bugs, including race conditions, which can occur in concurrent software can’t be reliably detected by testing. Ten tests wouldn’t be enough. Nor would 100, or even 1,000.

A concurrent application with a race condition is like a time bomb in your organization waiting to explode. It may chug along perfectly for years before a particular set of circumstances causes it to fail spectacularly.

[ Related: Software Testing Lessons Learned From Knight Capital Fiasco ]

[ How-to: Do Financial Trading Right: Behind the Scenes at Liquidnet ]

Here’s the problem in a nutshell. To get high performance and low latency, application code runs on two or more processor cores, with multiple streams of instructions running at the same time. One stream may be writing data to memory, and another stream may be reading it.

Usually, the write will occur before the read. But just occasionally, the stream that’s responsible for the write won’t get to that point in its execution in time. The other stream will get its read in first. That’s a race condition: The speed that each thread is executing affects the result.

Concurrent Apps Don’t Let You Dictate What Runs When or Where

On a single-core processor, that can’t happen. In a multicore processor, where streams are running concurrently, surely the outcome will always be the same? Surely the same stream will always win the race? Unfortunately, that’s not always the case. Concurrent applications display non-deterministic behavior. They don’t always yield the same results.

To understand why, bear in mind that a developer doesn’t have control over all parts of the environment in which an application will run. Execution is determined by a low-level scheduler that decides which bit of a program runs when. The coder doesn’t have access to this. Don’t forget, too, that hardware does things such as prefetch data and instructions and move information to and from caches.

[ Tips: 4 Ways CIOs Can to Respond to a Service Outage ]

[ Also: 6 Lessons From’s Failed Launch ]

In practice, every time an application executes, the background conditions are different. A problem may manifest itself only once every 10,000 times an application is executed &mash; or even more rarely.

That makes finding bugs very difficult indeed. Even if a test activates the bug, causing the application to fail, non-determinism means that it would not be reproducible. Rerun the test under the same apparent conditions and the application would almost certainly run flawlessly.

Static Code Analysis, Done Right, Can Find Concurrency Bugs

How can these race conditions be detected? The answer, according to Don Sannella, a professor of computer science at the University of Edinburgh, is through static analysis during development. Static analysis tools look at code without actually executing it and without using data, examining all possible data paths for all data values and looking for inconsistencies.

“A compiler does a simple form of static analysis when it looks for errors [such as] trying to divide letters by a number,” Sannella says. “Static analysis tools … go much deeper.”

Static analysis tools have been around for about a decade, with large vendors such as IBM and HP as well as smaller companies such as WhiteHat Security and Veracode dominating the market. While they offer tools that can analyze a range of languages, most are designed to detect security vulnerabilities such as buffer overflows or SQL injection issues rather than concurrency bugs.

Sannella’s company, Contemplate, is developing something slightly different: A Java static analysis tool called ThreadSafe that’s specifically designed to spot concurrency bugs. “Concurrency bugs are the hardest type to find,” Sannella says. “We will find more of them, but we won’t find other types of bugs.”

ThreadSafe targets Java because the language is so popular — and thus vulnerable to exploits — and was created from scratch for developing concurrent software, Sannella explains.

ThreadSafe integrates into Eclipse and is intended to be run from time to time during the development process itself, perhaps before the program is finished. “It is a bit like a spell checker,” Sannella says. “Maybe developers could run it while they go and get a cup of tea.”

[ Feature: DARPA Makes Finding Software Vulnerabilities Fun ]

A typical ThreadSafe report would warn of a variable that’s inconsistently synchronized and could therefore have a race condition. (Placing a lock or synchronization is a way to prevent a race condition.)

“The tool tells the developer that sometimes a variable is being locked, and sometimes not. It flags possible problems,” Sannella says. From there, developers can see whether they forgot to lock the variable in a few places, and therefore need to change the code — or they may decide that it doesn’t matter.

Finding Bugs Doesn’t Matter If They Can’t Be Squashed

Despite the capability of static analysis tools to draw concurrency bugs and serious security vulnerabilities to developers’ attention during the development phase, their use is far less widespread than one might imagine. The reason isn’t cost considerations but, rather, because they can often highlight an unmanageable number of possible problems — hundreds or even thousands — that must then be analyzed, prioritized and fixed (if necessary). Among them are likely a proportion, perhaps 10 percent, which are false positives.

Veracode CTO Chris Wysopal estimates that only about 40 percent of developers use static analysis. “Even then, they are not using them on every build, so they are not getting the full benefit of them,” he says. “Only 10 percent are using them in a really effective way.”

That’s because static analysis tools are good at finding issues but not fixing them, which leads to a backlog of problems waiting to be fixed, Wysopal says. “The holy grail for static analysis is the ability for the tools to carry out automated fixing.”

Paul Rubens is a technology journalist based in England. Contact him at Follow everything from on Twitter @CIOonline, Facebook, Google + and LinkedIn.