by Bill Bulkeley

How Cloud Computing Rose From Lehman Brothers’ Ashes

Apr 11, 2011
Cloud ComputingIT Strategy

The inside story of how Lehman Brothers is being dismantled in the cloud

James Johnson, an IT veteran with 25 years’ experience running Wall Street technology operations, walked into Lehman Brothers’ packing-box-strewn office high in the Time-Life building in Rockefeller Center. It was November 2008. Johnson had just been named Lehman’s CTO and had been given the job of operating the IT infrastructure needed to wind down the firm.

Just two months earlier, Lehman helped set off a worldwide financial panic by filing the largest Chapter 11 bankruptcy in U.S. history. By the time Johnson found himself picking his way around the cardboard boxes, most of Lehman’s brokerage and money-management operations had already been sold to international banks at fire-sale prices. But Lehman still owned over $600 billion worth of global assets, including real estate, those infamous mortgage-backed securities, derivatives and other hard-to-value items. The professional services and restructuring firm Alvarez and Marsal won the contract to wind down operations and turn those assets into as much cash as possible for Lehman’s creditors.

Maximizing the return to creditors required minimizing costs through efficient processes. Meeting the transparency requirements of the bankruptcy court overseeing the case necessitated bulletproof record-keeping. All that demanded high-performance IT systems.

But, in the middle of a massive bankruptcy, one was far more likely to find chaos than efficiency, reliability or performance. Indeed, Lehman could no longer call much of its IT operations its own. They had been sold off.

“My responsibility,” says Johnson, “was to find a solution to support the current business and at the same time support the wind-down.”

A Very Different Approach for a Very Different Job

Johnson anticipated that running the IT shop at Lehman would be very different from running IT at most new businesses. Most CIOs come into new jobs at new enterprises with the business leaders urging them to prepare for runaway growth. Then the CIOs procure resources to be sure they can accommodate that optimism. That often leads to racks of underused (and expensive) storage and servers because the predicted business surge never occurs.

Lehman, however, had a different problem. Johnson knew that the demands on its IT infrastructure would peak in the first months of operation and then decline steadily as the business sold off its assets under court supervision. What Johnson needed was a way to scale up rapidly without making a significant (or, ideally, any) capital investment. Even a few years ago, that would have presented an almost insurmountable challenge. Today, however, Johnson knew he could do that by finding a cloud to host Lehman’s shrinking infrastructure. It turned out to be the perfect solution for a failed business. (For expert advice about cloud-vendor contracts, see “How the Cloud Can Turn Toxic.”)

It may have been perfect, but it was undeniably unorthodox. Lehman was used to fine-tuning every aspect of its five data centers for maximum performance and absolute security. But the old Lehman was in its death throes and it had become clear, Johnson says, that the cloud is appropriate for many IT operations precisely because it provides flexibility at a time when business conditions are unpredictable.

“It wasn’t like I was looking for cloud computing,” Johnson says. But as he evaluated different business models, “cloud matched exactly what we wanted to do.”

Managing Chaos

The biggest issue for Lehman and Johnson was uncertainty. As Jeffrey Donaldson, a managing director of Alvarez and Marsal, who worked closely with Johnson, says, “It was a free-fall bankruptcy. We were faced with chaos from day one.” However, Donaldson recalls, “We quickly turned away from saying, ‘Oh my God, this is the biggest bankruptcy ever.’ We realized everyone was facing the same level of uncertainty” about the future of their business.

During the six months after Lehman Brothers filed for bankruptcy, the worldwide financial system teetered on the brink. “The crisis wasn’t just where we were standing,” says Johnson. “It was a global issue. There was no place to hide.”

Johnson was hired for the Lehman job after a long career at Cantor Fitzgerald, where he had run its IT operations in the United Kingdom. He started out at Lehman by examining what IT assets the company had and what the department’s responsibilities were now that the firm was bankrupt.

When Lehman first filed for bankruptcy, its North American brokerage business was quickly sold to the United Kingdom’s third-largest bank, Barclay’s, for $1.78 billion. Its European and Asian operations were acquired by Nomura Holdings of Japan. Those firms took over its IT operations and two of its data centers in the New York City area, using Lehman’s IT staff to run them.

Barclay’s and Nomura agreed to keep running Lehman’s IT operations under a costly month-to-month contract. Johnson was under immediate pressure to find a lower-cost alternative.

Even after the sale of the major brokerage businesses, which employed most of Lehman’s 10,000 workers, the operating company had a lot of responsibilities. Many assets required ongoing services. Lehman was obligated to continue collecting commercial mortgage principal and interest payments. It managed many syndicated loans, making it responsible to the rest of the syndicate for reporting monthly results and valuations. There was still a huge commodities and derivatives portfolio that had to be valued regularly. There were financial hedges that preserved the value of some of the financial instruments and had to be tracked daily.

Alvarez and Marsal had named new managers to run most of Lehman’s departments. Johnson sat down with each to determine what their IT needs would be. As Donaldson recalls, “There was a fair amount of uncertainty as they discovered their contractual obligations.”

Like most Wall Street firms, Lehman had run a huge IT operation. It deployed 2,700 software applications running on 27,000 servers that supported 100,000 different devices ranging from traders’ terminals to smartphones.

And now that vast operation needed to be supported while simultaneously being dismantled.

Implementing the Cloud Solution

In January 2009, Johnson and his team began the vendor-selection process. They sent RFPs to what he describes as a predictable list of hardware vendors and outsourcing specialists. (Johnson declined to name the eventual bidders.) Donaldson says, “We certainly looked at keeping some data centers and how we could leverage them. We looked at outsourcing and various buy-and-build scenarios. We hit on a hybrid of in-house capability and outside.”

Lehman decided that the cost of continuing to contract with Barclay’s and Nomura for IT was untenable, says Donaldson.

Johnson’s team’s highest priority was to figure out a strategy that would maximize flexibility. There was no telling what sort of infrastructure would be needed in five months, let alone five years. Indeed, it wasn’t even clear how long the operating company would continue to exist. Creditors were clamoring for their money. Claims had to be paid rapidly, but maximizing payouts was likely to take years.

As they looked deeper into the company’s needs and its constraints, Johnson and his team concluded that the best solution would be to provide most of the needed IT infrastructure through a little-known hosting company called BlueLock, which had a highly virtualized data center that maximized the kind of efficiency and flexibility Johnson required. BlueLock was founded in 2006, and Lehman immediately became one of its biggest clients.

“The economic model that cloud offered was perfectly aligned with what we wanted to follow,” Johnson says. Lehman needed a flexible solution because “there was no certainty as to what would happen next.” He says he knew many of his forecasts of needs would be wrong, and they were. Ultimately, he had to migrate 30 million files to the BlueLock cloud, six times his initial estimate.

“All our estimates were wrong, one way or another,” says Johnson. “What we were right about was understanding that they would have to be revised on the fly.”

One reason so many files had to be moved was the potential for litigation—a fact of life in bankruptcies, especially one as large as Lehman’s.

“You have to preserve every piece of data for discovery reasons,” says Ravi Kalakota, a managing director at Alvarez and Marsal.

Once the team concluded that a cloud solution would be the best way to achieve flexibility, it looked at cost. Cloud solutions are designed to be cheaper than in-house data management. The cloud host, with many different clients, can balance loads to maximize the capacity of each server and disk drive.

Pat O’Day, BlueLock’s CTO, says that normally he can’t buy equipment or build a data center more cheaply than a giant enterprise can. But those enterprises, he asserts, even when they virtualize their servers, still operate at less than 50 percent of capacity. O’Day declines to disclose BlueLock’s capacity ratings, but he says that with multiple clients and workloads, it has a much better opportunity to maximize use of its hardware and networking equipment.

Lehman declines to specify its cost savings compared to buying its own infrastructure. But Donaldson says that looking at IT spending as a percentage of total assets, its costs using BlueLock are lower by between 25 and 50 basis points, or up to half a percentage point. He says that puts it in the lowest quartile of financial-industry IT shops.

Speed was another key consideration. Building a new state-of-the-art data center would have delayed the migration by at least six months, Donaldson says, which was, of course, out of the question.

Lehman’s special situation also played into the decision to go into the cloud. “Knowing this was a bankruptcy, we made some decisions based on the understanding that this was a relatively short-term event,” says Johnson. Ideally, Lehman would have liked to have hardware that would depreciate to coincide with the day that the unwinding process was complete. But when that day would come was uncertain. Enron’s bankruptcy process was taking over 10 years, and it was clear that Lehman—which was dealing with over 70 administrators and thousands of counterparties—could take a similar amount of time to complete its process.

Alvarez and Marsal was also anxious to avoid capital spending that would create another asset that would ultimately have to be liquidated. “Capital preservation was critical,” Kalakota says.

At the same time, Johnson had to be satisfied that the cloud infrastructure would fill the needs of a very demanding set of users. “Lehman’s IT set a very high bar,” says Kalakota. Johnson’s commitment to the users was that even under these special circumstances, even with all the concomitant unpredictability and uncertainty, he would meet or exceed the performance to which they were accustomed.

In some cases, that meant that BlueLock installed servers close to users to avoid any possibility of lag. “Not all applications could tolerate the latency it would take” to have central hosting, says BlueLock’s O’Day. He says BlueLock installed what it calls “microclouds” in New York and Hong Kong to be sure that power users and traders didn’t experience network latency or slowdowns when working with large files. Print and file servers are also located in microclouds on site, he says. “To our clients, it seems like one large infrastructure. They don’t have to worry about what service they’re on,” he adds.

Johnson says that while cloud computing is still evolving, “We had faith that it would continue to mature. We knew that the version we were looking at today would be better tomorrow.”

The Long Goodbye

By July 2009, Johnson had a plan for moving Lehman’s infrastructure into BlueLock’s cloud. User migration began in August. Lehman moved a group of users every other week. IT engineers set up parallel environments on different floors of the building, with the users’ applications running on BlueLock. “We were testing to make sure we achieved the service levels we needed,” Johnson says. In all departments, users were brought in to try out the new system. In general, says Johnson, the applications ran exactly as they were running in the old Lehman data centers.

Johnson says that IT engineers would come into a department Friday at 5 p.m. and start the migration of the whole department. By Sunday morning, it would be complete. In some critical departments, he says, “We’d have some people running in the old environment and some running in the new environment” for a few days. But it quickly became clear to him that it was better to minimize such backstop measures and get everyone running on BlueLock to ensure consistency and simplify support issues.

By December, the migration was complete.

Today, Alvarez and Marsal is still unwinding Lehman’s assets. Lehman’s remaining real estate holdings are being managed by a spin-off company called Lamco, and its executives are considering trying to raise new capital to keep Lamco going as a separate financial-management firm even after the dismantling of Lehman is completed. As a result, says Johnson, the IT function may continue even after Lehman has vanished completely, making flexibility even more of a core need as Lamco’s IT requirements are, and for a indeterminate time will remain, uncertain.

Kalakota says, “There’s a great lesson here for any CIO who may have to deal with separating businesses. Cloud is the only way to do it effectively.”

Whether an enterprise is growing or shrinking, IT management will always require dealing with uncertainty, Johnson says. And, as Johnson has discovered, cloud computing can “give us maximum flexibility in being able to guess wrong and still meet all our objectives.”

Bill Bulkeley, formerly of the Wall Street Journal, is a freelance writer based in Massachusetts.