Halamka on Beth Israel's Health-Care IT Disaster
CAP was declared shortly after 4 p.m. By 6 p.m., a local CAP team from nearby Chelmsford, Mass., had set up a command center at the hospital and initiated "follow the sun" support?meaning additional staff at Cisco’s technical assistance centers would be plugged in to the crisis until their workday ended, when they’d hand off support to a similar group a few time zones behind them.
First, the CAP team wanted an instant network audit to locate CareGroup’s spanning tree loop. The team needed to examine 25,000 ports on the network. Normally, this is done by querying the ports. But the network was so listless, queries wouldn’t go through.
As a workaround, they decided to dial in to the core switches by modem. All hands went searching for modems, and they found some old US Robotics 28.8Kbps models buried in a closet. Like musty yearbooks pulled from an attic, they blew the dust off them. They ran them to the core switches around Boston’s Longwood medical area and plugged them in. CAP was in business.
By 9 p.m., they had pinpointed the problematic spanning tree loop. The Picture Archive Communication System (PACS) network, for sharing high-bandwidth visual files and other clinical data, was 10 hops away from the closest core network switch, three too many for spanning tree to handle.
And that’s when the dimensions of the problem fully dawned on the team members: They were struggling with an outmoded network. In September 2002, Halamka had hired Callisma’s Rusch to audit CareGroup’s infrastructure. When Rusch finished, he told Halamka, "You have a state-of-the-art network?for 1996."
Halamka’s network was all Layer 2 switches with no Layer 3 routing. Switching is fast, inexpensive and relatively dumb, and it relies on spanning tree protocol. Routing is more expensive but smarter. Routers have quality-of-service throttles to control bandwidth and to isolate heavy traffic before it overwhelms the network. State-of-the-art networks in 2002 have routing at their core.
In 1996, CareGroup’s network was Beth Israel Hospital, and at its core was a switch called Libby030. In October of that year, the hospital merged with Deaconess Hospital. Deaconess’s network was plugged into Libby030.
Other systems were tacked on in the same way. In 1998, CareGroup connected PACS to what used to be Deaconess Hospital. A year later, CareGroup linked a new data center and its two core switches (RCA and RCB) to Libby030. There would be a fourth core switch added and a skein of redundant links, but Libby030 remained the main outlet. Halamka now understands that this was a "network of extension cords to extension cords. It was very fragile," he says.





