Halamka on Beth Israel's Health-Care IT Disaster
Saturday night, with the redundant core in place, Halamka turned on the network. It hummed. There was clapping and cheering and backslapping among the team, which had grown to 100. Halamka passed around bottles of Domain Chandon champagne that his wife had bought at Costco. Then he went home.
At 1 a.m., his pager woke him.
Another CPU spike.
Sunday And on the Fifth Day, Halamka Rested
The problem was simple: A bad network card in RCB, one of the core switches. They replaced the card. Halamka went back to sleep.
Beep. 6 a.m. This time, it was a memory leak in one of the core switches. The CAP team quickly determined the cause: buggy firmware, an arcane VLAN configuration issue. They fixed it.
All day, the team documented changes. Halamka refused to say the network was back, even though it was performing well. "Let us not trust anyone’s opinion on this," he recalls thinking. "Let us trust the network to tell us it’s fine by going 24 hours without a CPU spike."
Monday Back in Business
Halamka arrived at his office at 4 a.m., nervous. He launched an application that let him watch the CPU load on the network. It reads like a seismograph. Steep, spiky lines are bad, and the closer together they are, the nastier the congestion. At one point on Thursday, the network had been so burdened that the lines had congealed into thick bars.
Around 7:30 a.m., as the hospital swung into gear, Halamka stared at the graph, half expecting to see the steep, spiky lines.
They never came. At noon, Halamka declared "business as usual." The crisis was over. It ended without fanfare, Halamka alone in his office. The same way it had started.





