by Rebecca Merrett

Dynamic model predicts a patient’s mortality from electronic medical records

Aug 14, 2015
Healthcare Industry

Electronic medical records are not only useful as a central repository for storing and updating information but also provide rich sources of data that can be analysed and mined to improve patient care.

Karla Caballero and Ram Akella from the University in California in Santa Cruz and Berkeley have used patient data to create a dynamic model that predicts an individual’s mortality using both numerical and textual data from electronic medical records and vital signs readings.

Their study – Dynamically Modeling Patient’s Health State from Electronic Medical Records: A Time Series Approach – was released as part of the the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining in Sydney.

“The timely and accurate estimation of patient’s probability of mortality allows us to successfully trigger a medical alarm,” the authors said.

“This estimation also permits the early identification of patients with elevated clinical risk. As a result, health care providers can differentiate those patients from the ones who are stable and improving in order to assign medical resources more effectively.”

The study points out the shortfalls of some current methods for predicting patients’ health risk in hospitals that it aims to address. The Apache III and SAPS II systems, for example, only focus on the worse-case scenario values during the first 24 hours a patient has been admitted into intensive care.

That could overestimate the likeness of mortality occurring in a patient, the authors said.

Other data mining approaches to predicting mortality also have their shortfalls, the authors said. For example, fitting a logistic regression model from the most recent observed value of a patient in the first 48 hours of being in intensive care is not timely enough to help prevent mortality.

Or, taking a more dynamic approach by collapsing blood pressure, heart rate and any other time series data into static features to do classification doesn’t necessarily help determine whether an increase in blood pressure is a good sign (if a patient suffers from low blood pressure) for a patient or not.

Other models for predicting mortality do not draw on much textual data due to the complexity and difficulty in dealing with unstructured data. Lab reports, doctors and nurses’ notes, admission and discharge information and procedures reports offer context that numerical values cannot give.

“Text data contains key information that is potentially useful to better predict the presence of an increase in the probability of mortality.”

The Bayesian time series based model gives a probability of a patient’s mortality as an aggregated latent state and the probability is updated every time it observes or is given new features such as lab results and vital signs.

“We incorporate the user features into an aggregated patient state that evolves over time, in contrast to static classification models. This approach allows us to predict future values of the state as more readings become available.

“By using a dynamic model, we predict the probability of mortality before the 24-hour window is complete. As a result, medical alarms can be triggered earlier as opposed to static methods,” the authors said.

When it comes to textual data, noun phrases were extracted using Clinical Text Analysis and Knowledge Extraction System and Metamap tools. Also, each text entry from a doctor, GP or nurse had a timestamp with it, which was used to make a time series for each text data feature for the model.

The authors then selected the noun phrases that described a disease, procedure or medication using the SNOMED clinical terms or medical ontologies.

“By means of tf-idf [term frequency – inverse document frequency] term selection, we select the most important noun phrases and remove those with low score.

“Once the phrase selection is completed, we perform standard stop words removal and stemming before indexing the daily text entries using the extracted noun phrases and the single terms.”

The authors were able to create term and topic type features to incorporate into the model.

One of the biggest challenges when working with e-medical records and health data is the large volume of missing values as the data isn’t collected all the time. Standard methods for calculating and imputing mean values to fill in the missing values could create more noise in the model and does not take into account that some features are highly dependent on previous values.

The authors used a regularised Expectation Maximisation algorithm to help accurately impute missing values.

“This method uses a regularisation parameter to ensure the existence of positive definitive matrices needed to impute the missing data accurately,” the authors said.

The authors tested the model on 11,648 medical records of patients admitted to intensive care from the MIMIC II clinical database.

Looking at a 24-hour window, the authors’ model – which uses numerical, term based and topic based features – outperformed other useful models for the prediction task. It received a 0.7905 in specificity, which measures true negatives or false alarms correctly classified as false alarms.

The Apache III only received a specificity score of 0.1090, while SAPS II received 0.1393. Random Forest, which is hailed in the industry as being an effective model for many different prediction tasks, received a specificity of 0.1084.

When it comes to the F score (a measure for model accuracy), the authors’ model received 0.5229, compared to Apache III at 0.4662, SAPS II at 0.3863 and Random Forest at 0.3929.

“A low specificity value implies the existence of a large number of false alarms,” the authors explained. “Physicians do not want to be overloaded with false alarms at the time a true alarm arrives.

“The proposed approach has higher F-score than other reported methods of the literature. This measure, which shows the ratio between the sensitivity and the positive predictive value, is very important in the correct detection of true alarms.

“Detecting all true alarms correctly is desirable since the cost of not detecting a patient who is very ill and dies is very high.”