by Scott Berinato

The Problems with Patching Software

Nov 01, 200319 mins
IT Strategy

Early one Saturday morning last January, from a computer located somewhere within the seven continents, or possibly on the four oceans, someone sent 376 bytes of code inside a single data packet to a SQL Server. That packet—which would come to be known as the Slammer worm—infected the server by sneaking in through UDP port 1434. From there it generated a set of random IP addresses and scanned them. When it found a vulnerable host, Slammer infected it, and from its new host invented more random addresses that hungrily scanned for more vulnerable hosts.

Slammer was a nasty bugger. In the first minute of its life, it doubled the number of machines it infected every 8.5 seconds. (Just to put that in perspective, in July 2001 the famous Code Red virus doubled its infections every 37 minutes. Slammer peaked in just three minutes, at which point it was scanning 55 million targets per second.)

Then, Slammer started to decelerate, a victim of its own startling efficiency as it bumped into its own scanning traffic. Still, by the 10-minute mark, 90 percent of all vulnerable machines on the planet were infected. But when Slammer subsided, talk focused on how much worse it would have been had Slammer hit on a weekday or, worse, carried a destructive payload.

Slammer’s maniacal binge occurred a full six months after Microsoft had released a patch to prevent it. Those looking to cast blame—and there were many—cried a familiar refrain: If everyone had just patched his system in the first place, Slammer wouldn’t have happened.

But that’s not true. And therein lies our story.

Slammer was unstoppable. Which points to a bigger issue: Patching no longer works.

Partly, it’s a volume problem. There are simply too many vulnerabilities requiring too many combinations of patches coming too fast. Picture Lucy and Ethel in the chocolate factory—just take out the humor.

But perhaps more important and less well understood, it’s a process problem. The current manufacturing process for patches—from disclosure of a vulnerability to the creation and distribution of the updated code—makes patching untenable. At the same time, the only way to fix insecure post-release software (in other words, all software) is with patches.

This Hobson’s choice has taken patching and the newly minted discipline associated with it, patch management, into the realm of the absurd.

Hardly surprising, then, that philosophies on what to do next have bifurcated. Depending on whom you ask, it’s either time to patch less—replacing the process with vigorous best practices and a little bit of risk analysis—or it’s time to patch more—by automating the process with, yes, more software.

“We’re between a rock and a hard place,” says Bob Wynn, former CISO of the state of Georgia. “No one can manage this effectively. I can’t just automatically deploy a patch. And because the time it takes for a virus to spread is so compressed now, I don’t have time to test them before I patch either.”

How to Build a Monster

Patching is, by most accounts, as old as software itself. Unique among engineered artifacts, software is not beholden to the laws of physics; it can endure fundamental change relatively easily even after it’s been “built.” Automobile engines, by contrast, don’t take to piston redesigns once they roll off the assembly line nearly so well.

This unique characteristic of software has contributed to a software engineering culture that generally regards quality and security as obstacles. An adage among programmers suggests that when it comes to software, you can pick only two of three: speed to market, number of features, level of quality. Programmer’s egos are wrapped up in the first two; rarely do they pick the third (since, of course, software is so easily repaired later, by someone else).

Such an approach has never been more dangerous. Software today is massive (Windows XP contains 45 million lines of code) and the rate of sloppy coding (10 to 20 errors per 1,000 lines of code) has led to thousands of vulnerabilities. CERT published 4,200 new vulnerabilities last year—that’s 3,000 more than it published three years ago. Meanwhile, software continues to find itself running evermore critical business functions, where its failure carries profound implications. In other words, right when quality should be getting better, it’s getting exponentially worse.

Patch and Pray

stitching patches into these complex systems, which sit within labyrinthine networks of similarly complex systems, makes it impossible to know if a patch will solve the problem it’s meant to without creating unintended consequences. One patch, for example, worked fine for everyone—except those unlucky users who happened to have a certain Compaq system connected to a certain RAID array without certain updated drivers. In which case the patch knocked out the storage array.

Tim Rice, network systems analyst at Duke University, was one of the unlucky ones. “If you just jump in and apply patches, you get nailed,” he says. “You can set up six different systems the same way, apply the same patch to each, and get one system behaving differently.”

Raleigh Burns, former security administrator at St. Elizabeth’s Medical Center, agrees. “Executives think this stuff has a Mickey Mouse GUI, but even chintzy patches are complicated.”

The conventional wisdom is that when you implement a patch, you improve things. But Wynn isn’t convinced. “We’ve all applied patches that put us out of service. Plenty of patches actually create more problems—they just shift you from one vulnerability cycle to another,” Wynn says. “It’s still consumer beware.”

Yet for many who haven’t dealt directly with patches, there’s a sense that patches are simply click-and-fix. In reality, they’re often patch-and-pray. At the very least, they require testing. Some financial institutions, says Shawn Hernan, team leader for vulnerability handling in the CERT Coordination Center at the Software Engineering Institute (SEI), mandate six weeks of regression testing before a patch goes live. Third-party vendors often take months after a patch is released to certify that it won’t break their applications.

All of which makes the post-outbreak admonishing to “Patch more vigilantly” farcical and, probably to some, offensive. It’s the complexity and fragility—not some inherent laziness or sloppy management—that explains why Slammer could wreak such havoc 185 days after Microsoft released a patch for it.

“We get hot fixes everyday, and we’re loath to put them in,” says Frank Clark, former senior vice president and CIO of Covenant Health, whose six-hospital network was knocked out when Slammer hit, causing doctors to revert to paper-based care. “We believe it’s safer to wait until the vendor certifies the hot fixes in a service pack.”

On the other hand, if Clark had deployed every patch he was supposed to, nothing would have been different. He would have been knocked out just the same.

Attention Hackers: Weakness Here

slammer neatly demonstrates everything that’s wrong with manufacturing software patches. It begins with disclosure of the vulnerability, which happened in the case of Slammer in July 2002, when Microsoft issued patch MS02-039. The patch steeled a file called ssnetlib.dll against buffer overflows.

“Disclosure basically gives hackers an attack map,” says Gary McGraw, CTO of Cigital and the author of Building Secure Software. “Suddenly they know exactly where to go. If it’s true that people don’t patch—and they don’t—disclosure helps mostly the hackers.”

Essentially, disclosure’s a starter’s gun. Once it goes off, it’s a footrace between hackers (who now know what file to exploit) and everyone else (who must all patch their systems successfully). And the good guys never win. Someone probably started working on a worm to attack ssnetlib.dll as soon as Microsoft released MS02-039.

In the case of Slammer, Microsoft built three more patches in 2002—MS02-043 in August, MS02-056 in early October and MS02-061 in mid-October—for related SQL Server vulnerabilities. MS02-056 updated ssnetlib.dll to a newer version; otherwise, all of the patches played together nicely.

Then, on October 30, Microsoft released Q317748, a nonsecurity hot fix for SQL Server.

Danger: Patch Under Construction

Q317748 repaired a performance-degrading memory leak. But the team that built it had used an old, vulnerable version of ssnetlib.dll. When Q317748 was installed, it could overwrite the secure version of the file and thus make that server as vulnerable to a worm like Slammer as one that had never been patched.

“As bad as software can be, at least when a company develops a product, it looks at it holistically,” says SEI’s Hernan. “It’s given the attention of senior developers and architects, and if quality metrics exist, that’s when they’re used.”

Which is not the case with patches.

Patch writing is usually assigned to entry-level maintenance programmers, says Hernan. They fix problems where they’re found. They have no authority to look for recurrences or to audit code. And the patch coders face severe time constraints—remember there’s a footrace on. They don’t have time to communicate with other groups writing other patches that might conflict with theirs. (Not that they’re set up to communicate. Russ Cooper, who manages NTBugtraq, the Windows vulnerability mailing list, says companies often divide maintenance by product group and let them develop their own tools and strategies for patching.) There’s little, if any, testing of patches by the vendors that create them.

Ironically, maintenance programmers write patches using the same software development methodologies employed to create the insecure, buggy code that they are supposed to be fixing. It’s no surprise then that these Dr. FrankenPatches produce poorly written products that can break as much as they fix. For example, an esoteric flaw found last summer in an encryption program—one so arcane it might never have been exploited—was patched. The patch itself had a gaping buffer overflow written into it, and that was quickly exploited, says Hernan. In another case last April, Microsoft released patch MS03-013 to fix a serious vulnerability in Windows XP. On some systems, it also degraded performance by roughly 90 percent. The performance degradation required another patch, which wasn’t released for a month.

Slammer feasted on such methodological deficiencies. It infected both servers made vulnerable by conflicting patches and servers that were never patched at all because the SQL patching scheme was kludgy. These particular patches required scripting, file moves, and registry and permission changes to install. (After the Slammer outbreak, even Microsoft engineers struggled with the patches.) Many avoided the patch because they feared breaking SQL Server, one of their critical platforms. It was as if their car had been recalled and the automaker mailed them a transmission with installation instructions.

Background Vulnerabilities Come to the Fore

the initial reaction to Slammer was confusion on a Keystone Kops scale. “It was difficult to know just what patch applied to what and where,” says NTBugtraq’s Cooper, who’s also the “surgeon general” at vendor TruSecure.

Slammer hit at a particularly dynamic moment: Microsoft had released Service Pack 3 for SQL Server days earlier. It wasn’t immediately clear if SP3 would need to be patched (it wouldn’t), and Microsoft early on told customers to upgrade their SQL Server to SP3 to escape the mess.

Meanwhile, those trying to use MS02-061 were struggling mightily with its kludginess, and those who had patched—but got infected and watched their bandwidth sucked down to nothing—were baffled. At the same time, a derivative SQL application called MSDE (Microsoft Desktop Engine) was causing significant consternation. MSDE runs in client apps and connects them back to the SQL Server. Experts assumed MSDE would be vulnerable to Slammer since all of the patches had applied to both SQL and MSDE users.

That turned out to be true, and Cooper remembers a sense of dread as he realized MSDE could be found in about 130 third-party applications. It runs in the background; many corporate administrators wouldn’t even know it’s there. Cooper estimated it could be found in half of all corporate desktop clients. In fact, at Beth Israel Deaconess Hospital in Boston, MSDE had caused an infestation although the network SQL Servers had been patched.

When customers arrived at work on Monday and booted up their clients, which in turn loaded MSDE, Cooper worried that Slammer would start a reinfestation, or maybe it would spawn a variant. No one knew what would happen. And while patching thousands of SQL Servers is one thing, finding and patching millions of clients with MSDE running is another entirely. Still, Microsoft insisted, if you installed SQL Server SP3, your MSDE applications would be protected.

It seemed like reasonable advice.

Then again, companies take more than a week to stick a service pack into a network. After all, single patches require regression testing, and service packs are hundreds of security patches, quality fixes and feature upgrades rolled together. In a crisis, upgrading a service pack that was days old wasn’t reasonable. Cooper soon learned that Best Software’s MAS 500 accounting software wouldn’t run with Service Pack 3. MAS 500 users who installed SP3 to defend against Slammer had their applications fall over. They would have to start over and reformat their machines. All the while everyone was trying to beat Slammer to the workweek to avoid a severe uptick in Slammer infections when millions of machines worldwide were turned on or otherwise exposed to the worm that, over the weekend, remained blissfully dormant.

“By late Sunday afternoon, Microsoft had two rooms set up on campus,” says Cooper. “Services guys are in one room figuring out what to say to customers. A security response team is in the other room trying to figure out how to repackage the patches and do technical damage control.

“I’m on a cell phone, and there’s a guy there running me between the two rooms.” Cooper laughs at the thought of it.

Why Every Patch Starts from Zero

As the volume and complexity of software increases, so do the volume and complexity of patches. The problem with this, says SEI’s Hernan, is that there’s nothing standard about the patch infrastructure or managing the onslaught of patches.

There are no standard naming conventions for patches; vulnerability disclosure comes from whatever competitive vendor can get the news out first. Distribution might be automated or manual; and installation could be a double-click .exe file or a manual process.

Microsoft alone uses a hierarchy of eight different patching mechanisms (the company says it wants to reduce that number). But that only adds to more customer confusion.

“How do I know when I need to reapply a security rollup patch? Do I then need to reapply Win2K Service Pack 2? Do I need to reinstall hot fixes after more recent SPs?” Similar questions were posed to a third-party services company in a security newsletter. The answer was a page-and-a-half long.

There’s also little record-keeping or archiving around patches, leaving vendors to make the same mistakes over and over without building up knowledge about when and where vulnerabilities arise and how to avoid them. For example, Apple’s Safari Web browser contained a significant security flaw in the way it validated certificates using SSL encryption, which required a patch. Every browser ever built before Safari, Hernan says, had contained the same flaw.

“I’d like to think there’s a way to improve the process here,” says Mykolas Rambus, CIO of financial services company W.P. Carey. “It would take an industry body—a nonprofit consortium-type setup—to create standard naming conventions, to production test an insane number of these things, and to keep a database of knowledge on the patches so I could look up what other companies like mine did with their patching and what happened.”

Rambus doesn’t sound hopeful.

Slammer Dopeslaps the Software Industry

Slammer has become something of a turning point. The fury of its 10-minute conflagration and the ensuing comedy of a gaggle of firefighters untangling their hoses, rushing to the scene and finding that the building has already burnt down, left enough of an impression to convince many that patching, as it is currently practiced, doesn’t work.

“Something has to happen,” says Rambus. “There’s going to be a backlash if it doesn’t improve. I’d suggest that this patching problem is the responsibility of the vendors, and the costs are being taken on by the customers.”

There’s good news and bad news for Rambus. The good news is that vendors are motivated to try and fix the patch process. And they’re earnest—one might say even religious—about their competing approaches. And the fervent search for a cure has intensified markedly since Slammer.

The bad news is that none of what’s happening changes the economics of patching. Customers still pay.

Patch More or Patch Less: A Hobson’s Choice

There are two emerging and opposite patching philosophies: Patch more, or patch less.

Vendors in the Patch More school have, almost overnight, created an entirely new class of software called patch management software. The term means different things to different people (already one vendor has concocted a spinoff, “virtual patch management”), but in general, PM automates the process of finding, downloading and applying patches. Patch More adherents believe patching isn’t the problem, but manual patching is. Perfunctory checks for updates and automated deployment, checks for conflicts, roll back capabilities (in case there is a conflict) will, under the Patch More school of thought, fix patching. PM software can keep machines as up-to-date as possible without the possibility of human error.

The CISO at a major convenience store chain says it’s already working. “Patching was spiraling out of control until recently,” he says. “Before, we knew we had a problem because of the sheer volume of patches. We knew we were exposed in a handful of places. The update services coming now from Microsoft, though, have made the situation an order of magnitude better.”

Duke University’s Rice tested patch management software on 550 machines. When the application told him he needed 10,000 patches, he wasn’t sure if that was a good thing. “Obviously, it’s powerful, but automation leaves you open to automatically putting in buggy patches.” Rice might be thinking of the patch that crashed his storage array on a Compaq server. “I need automation to deploy patches,” he says. “I do not want automatic patch distribution.”

The Patch Less constituency is best represented by Peter Tippett, vice chairman and CTO of TruSecure. Based on 12 years of actuarial data, he says that only about 2 percent of vulnerabilities result in attacks. Therefore, most patches aren’t worth applying. In risk management terms, they’re at best superfluous and, at worst, a significant additional risk.

Instead, Tippett says, improve your security policy—lock down ports such as 1434 that really had no reason to be open—and pay third parties to figure out which patches are necessary and which ones you can ignore. “More than half of Microsoft’s 72 major vulnerabilities last year will never affect anyone ever,” says Tippett. “With patching, we’re picking the worst possible risk-reduction model there is.”

Tippett is at once professorial and constantly selling his own company’s ability to provide the services that make patching less viable. But many thoughtful security leaders think Tippett’s approach is as flawed and dangerous as automated patch management.

“He’s using old-school risk analysis,” says Burns. “How can you come up with an accurate probability matrix on blended threat viruses using 12 years of data when they’ve only been around for two years?”

An additional problem with the Patch Less school is the feeling of insecurity it engenders. Not patching is sort of like forgetting to put on your watch and feeling naked all day. Several information executives described an illogical pull to patch, even if the risk equation determined that less patching is equally or even more effective.

There’s also an emerging hybrid approach—which combines the patch management software with expertise and policy management. It also combines the costs of paying smart people to know your risks while also investing in new software.

Hernan says, “I can understand the frustration that can lead to the attitude of, ’Forget it, I can’t patch everything,’ but that person’s taking a big chance. On the other hand, he’s also taking a big chance applying a patch.”

“I don’t have much faith in automated patching schemes,” says Rambus. “But I could be convinced.”

Wynn is ambivalent too. “If you think patch management is a cure, you’re mistaken. Think of it as an incremental improvement. I have to take a theory of the middle range,” he says vaguely.

It’s Alive! The Persistence of Slammer

On Monday after Slammer hit, Microsoft rereleased MS02-061 to cover up the memory leak and update ssnetlib.dll, and it was much easier to install. Of course, by then, Slammer was already pandemic. Microsoft itself was infected badly, prompting a moment of schadenfreude for many. ISP networks had collapsed; several root DNS servers were overwhelmed; airlines had canceled flights; ATM machines refused to hand out money. In Canada, a national election was delayed.

The patches had, at best, a miniscule effect. What ended up preventing Slammer from worming its way into the workweek and causing even more damage, it turns out, was a rare and unusual gesture by ISPs. That same Monday, they agreed to block Internet traffic on UDP port 1434, the one Slammer used to propagate itself. “That’s what allowed us to survive,” says Cooper.

And surely, with ISPs blocking the door, companies would seize the opportunity to update, test and deploy the new patches. Or they could upgrade to Service Pack 3. They could locate and patch all their MSDE clients and, finally, kill Slammer dead.

But 10 days later, when ISPs opened port 1434 again, there was a spike in Slammer infections of SQL Servers. Six months later, in mid-July, the listener service showed Slammer remained the most prevalent worm in the wild, twice as common as any other worm. It was still trolling for, and finding, unpatched systems to infect.