Halamka on Beth Israel's Health-Care IT Disaster
Spanning tree protocol is like a traffic cop. Data arrives at a switch and asks spanning tree for directions. Say, from John’s server to Mary’s desktop. Spanning tree calculates the shortest route. It then blocks off every other possible route so that the data will go straight to its destination without having to make decisions at other crossroads along the way.
But spanning tree will look only as far out as seven intersections. Should data reach an eighth intersection, called a hop in networking, it will lose its way. Often, it will drive itself into a loop. This clogs the network in two ways. First, the looped traffic itself gums up the works. Then, other switches start to use their computing horsepower to recalculate their spanning trees?to make up for the switch that is directing traffic in a loop?instead of directing their own traffic.
That’s what happened at Beth Israel Deaconess. On Wednesday, a researcher uploaded data into a medical file-sharing application, and it looped. The data was several gigabytes, so it clogged the pipes. Then, when Halamka’s team turned off a switch at 1:45 p.m., it was as if one cop closed an intersection and every other cop stopped traffic in all directions to figure out alternate routes.
Halamka’s team now knew what happened, if not where it happened. Standard troubleshooting protocol for spanning tree loops calls for cutting off redundant links on the network. "What you’re doing is eliminating potential spots where there are too many hops, and creating one path from every source to every destination," Callisma’s Rusch says. "It might make for a slower environment"?without backup?"but it should make for a stable environment."
"We cut the links," Halamka says. "It seemed to work. We went home feeling great. We had figured it out."
Thursday Clogged Arteries
Hospitals come alive early. By 7 a.m., doctors and nurses started to send some of Beth Israel Deaconess’s 100,000 daily e-mails. The pharmacy began filling prescriptions, transferring the first bits of the 40 terabytes that traverse the network daily. Some of the 3,000 daily lab reports were beginning to move.
By 8 a.m., the network again started acting as if it were flying into a headwind. Halamka realized the network had settled down the night before only because hardly anyone was using it. When the workday began in earnest, CPU usage spiked. The network started flapping. The problem hadn’t been fixed.
Halamka’s team scrambled to find other possible sources of the trouble. One suspect was CareGroup’s network of outlying hospitals in Cambridge, Needham, Ayer and elsewhere in Massachusetts. They operated as a distinct network that plugged into Beth Israel Deaconess. The community hospitals’ network was sluggish, and a billing application wasn’t working, according to Jeanette Clough, CEO of Mount Auburn Hospital in Cambridge, which serves as the hub for the outlying hospitals’ network.





