by Ken Terry

Big data makes a difference at Penn Medicine

Oct 08, 2015
AnalyticsBig DataElectronic Health Records

Hereu2019s how one healthcare organization is making use of the massive amount of information u2013 measurable in petabytes u2013 it now has at its disposal to save lives.

The team of clinicians and medical informatics experts led by Mike Draugelis, chief data scientist at Penn Medicine in Philadelphia, is busy these days. Using insights from a massively parallel computer cluster that stores a huge volume of data, the team is building prototypes of new care pathways, testing them out with patients and feeding the results back into algorithms so that the computer can learn from its mistakes.

mike draugelis penn medicine

Mike Draugelis, chief data scientist at Penn Medicine.

This big data approach to improving the quality of care has already produced one significant success: The Penn team has improved the ability of clinicians to predict which patients are at risk of developing sepsis, a highly dangerous condition, and it can now identify these patients 24 hours earlier than it could before the algorithm was introduced. 

Draugelis and his colleagues work in the hospital of the University of Pennsylvania. On the academic research side, the university’s medical school has launched an Institute of Biomedical Informatics (IBI) to do basic research using big data techniques. Announced in 2013, IBI is now coalescing a few months after naming Jason Moore, Ph.D., who founded a similar institute at Dartmouth, as its director. IBI will focus its efforts on precision medicine, a hot field that is starting to take off as genomic sequencing costs drop. 

c william hanson penn medicine

C. William Hanson III, M.D., chief medical information officer and vice president of Penn Medicine.

The effort to link genomic differences with “phenotypes” – the variations in patients’ characteristics and diseases – has been underway for five years, says C. William Hanson III, M.D., chief medical information officer and vice president of Penn Medicine and a member of IBI. But he sees this kind of research quickly accelerating. 

Steven Steinhubl, M.D., director of digital medicine at the Scripps Translational Science Institute in La Jolla, Calif., agrees. “We’re still on the rising part of the curve of what we’re going to learn from big data,” he says. “It’s rapidly growing, but it will accelerate even more as large medical centers like UPenn take advantage of the data they’re already collecting and add genomics on top of that.” 

Changing clinical pathways

Draugelis’ team at Penn Medicine is using algorithms to tweak the guidelines that doctors and nurses follow in diagnosing and treating particular conditions. When a protocol changes, he explains, the clinical team must develop a new care pathway that specifies each step in the workflow of clinicians. It is very intensive work, and so is coding the changes that must be made in the algorithm to adjust to the feedback from the frontline of patient care.

“We’re working in two week sprints, where the clinicians adjust their pathways, and we adjust the algorithms to their needs,” Draugelis notes. 

The team builds a prototype of a new pathway for a particular condition about once every six months. Currently, it is focusing on finding a better way to predict which patients have congestive heart failure and which are likely to be readmitted after discharge from the hospital. In addition, the team is working on acute conditions such as maternal deterioration after delivery and severe sepsis. 

“We’re creating machine learning predictive models based on thousands of variables,” Draugelis says. “We look at them in real time, but we train them up over millions of patient records.” 

steven steinhubl scrippshealth

Steven Steinhubl, M.D., director of digital medicine at the Scripps Translational Science Institute.

In the case of sepsis, the team started with an expert model known as SIRS (systematic inflammatory response syndrome), which uses specific thresholds of temperature, heart rate, respiratory rate, and white blood count as key indicators of sepsis risk. After loading in all of the available data on a patient, including electronic health record (EHR) data, the computer uses the algorithm to determine how closely a patient’s characteristics match those of patients who developed sepsis in the past. When a patient matches that profile, the clinician caring for the patient receives an alert, acts on it or doesn’t, and feeds his or her reaction back to the algorithm to improve it. 

[Related: How big data analytics help hospitals stop a killer] 

Penn Medicine’s bedside monitors continuously track vital signs and document them in the EHR. This automated documentation of vital signs didn’t occur five years ago, Hanson notes. It is still not widespread outside of intensive care units, says Steinhubl, but when it does become routine, he adds, it will provide a major boost to the kind of work that Draugelis’ team does. 

dean sittig university of texas

Dean Sittig, Ph.D., a professor at the University of Texas Health School of Biomedical Informatics.

Dean Sittig, Ph.D., a professor at the University of Texas Health School of Biomedical Informatics in Houston, likes the idea of continuous monitoring and feeding data into computer algorithms. In contrast to the average floor nurse, who can only watch a patient 20 percent of the time if she has five patients, “The computer can be looking at every minute, and the idea of continuous monitoring and surveillance is very powerful,” he says. “If you can teach the computer what the nurse would be looking for, the computer can be much more vigilant [than the nurse].”

To make the decision support alerts useful, however, the staff has to be ready to spring into action, especially with a condition like sepsis, Sittig says. In addition, the alerts that the algorithm triggers must be fairly accurate. “As a rule of thumb, if the computer is right more than half the time – especially with something serious like sepsis – clinicians will pay attention to it. But if it’s only right 10 percent of the time, it starts to be a bother.”

Precision medicine

Two important developments have come together to make possible the kind of precision medicine research that Penn Medicine’s IBI is doing. First, EHRs have become widespread in the past few years: most hospitals and more than 80 percent of physicians now have these systems. Second, the cost of genomic sequencing has dropped to around $1,000 for a complete genome. The cost of partial genome or exome sequencing is less than that. As a result of these trends, the idea of correlating genotypic and phenotypic variants to discover individual responses to diseases and drugs is now feasible.

To perform this kind of research, Penn Medicine has created a specialized “bio-bank” that, so far, has stored about 20,000 genomic samples with patients’ permission, says Brian Wells, associate vice president of health technology and academic computing for the healthcare system. A separate center for personalized diagnostics has sequenced tumor genomes for more than 5,000 patients, he notes.

[Related: Can cloud collaboration and data analytics cure cancer?]

The sheer volume of genomic data is staggering. For example, Penn Medicine has two petabytes of disk space in its high performing computer cluster, and it plans to expand that, says Wells.

“One researcher told us that in the next few years, he might go from five to 30 petabytes of space related to neuroscience sequencing. So we’re prepared to add to that as we need to,” he notes. 

Challenges for CMIOs and CIOs

The biggest challenges that Hanson faces as Penn Medicine grapples with its big data projects, he says, is the lack of interoperability among EHRs and the need for good, clean, structured data. Currently, Penn has different EHRs in its hospital, ER, ICU and ambulatory practices, but it is moving to a single system. Structured clinical data is harder to deliver, however, because “clinicians tend to document in an unstructured way,” he says.

Penn intends to use natural language processing (NLP) to mine unstructured data in EHRs and convert it into structured information, Wells notes. “That’s for retrospective analysis rather than clinical decision support, because you can’t rely it on it 100 percent of the time,” he adds.

Current big data methods are adequate for processing the huge flood of genomic data, but bio-informaticians who know how to work with this data are in short supply, Steinhubl says. He predicts that a bottleneck will develop in data processing and storage when healthcare providers begin to review the physiologic data that is expected to flow in from mobile devices and wearable sensors.

Nevertheless, Steinhubl is very excited about the promise of big data in fields like precision medicine and clinical quality improvement. “Eventually, it’s going to completely change medicine and the way we treat common chronic conditions,” he says.

For example, he notes, most cases of hypertension are defined as a single disease. “So we put them all in one basket and treat them the same way. With these tools, we’ll be able to refine their phenotype and their genotype and better treat these individuals. Right now, it’s mostly trial and error.” 

Hanson leavens the great expectations of big data with a few sober reflections. First, he notes, it will be some time before most providers are ready to pull in remote monitoring data, because it has to be prescreened to be usable in patient care. Second, while precision medicine is a great idea, most people haven’t yet been sequenced, and “we don’t have a consistent way of interpreting their genotypic data and making it actionable.”

While oncologists are increasingly using information about the genetic differences among individual cancer patients, it will be a while, Hanson says, before this approach filters down to primary care physicians. However, precision medicine research is moving fast at Penn Medicine and other leading academic medical centers. “We’re on the verge of an explosive development,” he says.