Why you need to worry about the security of open source software in 2018 and beyond

The speed of open source deployment by enterprises everywhere puts software security into question.

It’s important to say this up front. Mark Curphey loves open source software.

“I’m a big fan, and always have been,” the founder and CEO of the security startup SourceClear says.

After all, what’s not to like about people pulling together in the same direction for the common good of building useful software? And software that you can get for free? If you have a good idea, you can build it, release it to the world, and typically other people will over time help you make it better.

That in broad brush strokes summarizes the utopian vision open source proponents have advocated for years. And in a large way they’ve succeeded. Open source software can be found anywhere there are computers in large enterprises: it powers much of the financial industry, much of the life sciences industry; it can be found in Tesla’s electric cars, and it’s training autonomous vehicles; it can even used inside the top secret environs of U.S. intelligence agencies like the CIA.

In fact open source software is so widely used that you’d be hard pressed to find an IT decision-maker in a large organization anywhere that hasn’t deployed a fair amount of it throughout their operation.

The CIO of a large investment bank once explained to me like this: when considering software options his order of preference was as follows: “download, build or buy,” with open source taking precedence. Building an in-house solution came second, and buying something off-the-shelf last. He’s not alone. The cost advantages of open source alone can be huge and there are numerous others.

But that doesn’t mean everything about open source is perfect. The problem, as Curphey sees it is that so many open source software libraries and components get used and reused over and over. The end result: vulnerable code winds up all over the place, exposing applications and devices that may have practically no relationship to each other to attack.

And that’s where the problems begin. Curphey likens it to “eating from a dirty fork.” Reusable software leads to vulnerabilities that are re-created over and over.

One recent and particularly egregious example: Apache Struts an open source framework for creating Java web applications. Many companies use it to build parts of their online storefronts. Among the many companies using it was the credit reporting firm Equifax, and you know how that turned out: attackers exploiting a vulnerability in Struts made off with data on some 150 million consumers.

The example of the Struts vulnerability prompted me to call another expert on this: Chris Wysopal is the CTO and co-founder of Veracode, a united of CA Technologies which operates a cloud-based service that scans software code for vulnerabilities.

The security industry is only now coming to grips with the scale for potential widespread problems, Wysopal says. Long before the Equifax breach made headlines, Veracode estimated in a report that the Struts vulnerability exposed as many as 35 million sites to remote code execution attacks.

“Open source is so popular and pervasive now that vulnerabilities are emerging as different class of threat,” he says. “When you find a vulnerability in an open source components, you’re likely to find it in all the applications that use that component.”

Another example was KRACK, a vulnerability that emerged two months ago. Open source code used on numerous different models of Wi-Fi routers exposed those products, and the traffic from devices on their networks to eavesdropping.

A third example was Apache Commons Collections, (ACC) a component that’s widely used in Java applications. Veracode found that more than half of Java applications relied on versions of the component containing a vulnerability that left San Francisco’s Muni rail system open to a ransomware attack

Veracode did a deep dive on that case, looking at five generations of how one particular version of ACC containing the vulnerability was reused and found it showed up in more than 80,000 different components which were then re-used in million of applications. (Wysopal presented the findings in a talk at WebSummit in Lisbon last month.)

Those vulnerabilities are only the symptom of a much more fundamental problem that Curphey sees with open source software: people assume that if there’s a problem in an open source application, someone, somewhere will fix it.

“There’s this romantic notion that because you can see the source code, and that if something doesn’t work or there’s a problem with it, then someone in the community is going to fix it,” he says. “But most of the time that doesn’t happen.”

There’s no accountability in the open-source world. “The Apache Software Foundation is probably the most important software company in the world,” he said. “All of the life sciences work that’s being done right now, a lot of the financial companies in the world, and a lot of other important things are being done on open source software at their core. And many of the tools and libraries in use have security issues. Faith in the open source community is no defense.”

And it’s increasingly clear that the potential for more calamitous results is growing. Consider the following: In a study of 1,000 commercial applications by Black Duck Software, nearly all — 96 percent — contained open source components. And more than two-thirds of those applications — 67 percent — contained components with documented vulnerabilities, some of which have been known for four years or more.

So what to do? You could, as a consumer of open source software, examine the code for vulnerabilities and patch them yourself. Some big companies can afford to do this. Most consider it expensive and time-consuming: The costs in time and money erase much of the financial incentives that drove them into the open-source camp in the first place.

With no central tool chain or even a central set of policies from the Apache Foundation the onus is on the community itself — developers and consumers of open source software — for fixing these vulnerabilities when they’re found. The results are not encouraging. By SourceClear’s reckoning maybe 10 percent are ever fixed.

It’s the kind of problem that might fixed by a company, and that’s exactly what SourceClear is setting out to do. Curphey ran application security for the investment house Charles Schwab and then did similar stints at Microsoft and a division of McAfee. He started SourceClear in 2014 and was later joined Alex Ethier, a veteran of the DevOps software company Chef who serves at Chief Product Officer.

SourceClear’s cloud-based service scans software code for vulnerabilities, and along the way it uses machine learning and data-science to determine where developers have been quietly fixing security problems in their code without telling anyone, by watching commits uploaded to open-source libraries, parsing change-logs, and keeping a close eye on bug-trackers.

Curphey estimates that as much as 90 percent of the vulnerabilities SourceClear finds are not listed in the National Vulnerabilities Database, the U.S. government-operated repository of software vulnerabilities.

SourceClear then integrates the results of those scans with modern agile development practices. Customers use the intelligence gathered from the scans to set policies: Which libraries are banned, and which ones are approved? What to do when a banned library has been used? It supports several continuous integration tools including Jenkins, Travis and Atlassian’s Bamboo among others. It also integrates with Jira, GitHub and GitLab for tracking issues that must be fixed.

The issue is becoming dire, Curphey says. “The bad guys have figured out that they can poison the open source software well,” he said. “We’ve seen numerous instances of malware being pushed directly into the open source ecosystem. The software supply chain itself is under attack.”

Veracode’s Wysopal has seen it too. Attackers are “typo squatting,” essentially creating packages with names that are only slightly different from popular ones found on the repository npm. In another case, authorities in Slovakia found tainted packages in the official repository for the Python programming language.

“There have been cases where people have caught backdoors being inserted into packages, Wysopal said. “The potential for this growing into a bigger problem is definitely there.”

Large companies with considerable software smarts — think Google and Facebook — have the resources to analyze the open source libraries they use for vulnerabilities before reusing them. Most companies aren’t so lucky.

That’s where Curphey hopes to make a mark with SourceClear. Based in San Francisco, it raised $10 million in a 2015 Series A led by Index Ventures and Storm Ventures. With tools built specifically for devops, he hopes to give developers a new edge in building applications with fewer inherent vulnerabilities.

However there’s more than just a commercial motivation behind Curphey’s intentions.

“Remember, I’m the guy who loves open source.”

This article is published as part of the IDG Contributor Network. Want to Join?

NEW! Download the Fall 2018 digital issue of CIO