GSK accelerates data analytics for clinical trials

Oct 27, 2017
CDO Mark Ramsey is helping the pharmaceutical giant turn decades of data into a drug-discovery asset, thanks to a homegrown big data analytics platform that processes petabytes.

GlaxoSmithKline is dreaming big with its big data. By tapping decades’ worth of clinical trial data, the pharmaceutical giant aims to deliver drugs to market more quickly. If it succeeds, it could seize an advantage in an industry oft-maligned for its plodding pace.

That’s the chief goal and challenge for GSK Chief Data Officer (CDO) Mark Ramsey, who admits that GSK a laggard within a lagging industry in its approach to leveraging data. GSK hired Ramsey in 2015 to turn that trend around. “Pharmaceuticals, in contrast to financial services, telecommunications or retail, has not progressed in using data as strategic asset,” says Ramsey, who joined GSK after a stint as Samsung Mobile CDO and several data analytics roles during his 18 years at IBM. “Our No. 1 goals is how to execute clinical trials more efficiently and effectively to accelerate drug discovery.”

Big Pharma is not alone in letting potentially rich data lie fallow in vast siloes. Companies are looking for leaders to help unlock advantages and operational efficiencies from these troves, as 90 percent of large companies are expected to have a CDO by the end of 2019, according to Gartner. By 2020, 50 percent of leading organizations will have a CDO with similar levels of strategy influence and authority as their CIO, according to Gartner analyst Doug Laney. CDOs can establish a leadership role by aligning their priorities with those of their organization. That’s what Ramsey is working toward.

Following is a look at Ramsey’s efforts to overhaul GSK’s data strategy.

Laying the data foundation

Pharmaceutical companies, many of which are decades or even centuries old, collect and store vast troves of data from their clinical trials. However, most simply sock the data away in various repositories, which accumulate more information with each clinical trial. GSK, which is more than 300 years old, maintains petabytes of such data in more than 2,100 siloes, many of which can potentially be mined for pharmaceutical insights, says Ramsey.

When he arrived at GSK, Ramsey assessed the company’s data profile and quickly learned that data analytics wasn’t used holistically across the organization. Rather, it was relegated to one-off clinical trials intended to bring new medicine to market. He saw ample opportunity to share data across trials, but it wasn’t going to happen without a comprehensive data platform: The GSK Big Data Information Platform.

Mark Ramsey, chief digital officer, GlaxoSmithKline

The foundation of this platform is a Cloudera Hadoop data lake, into which automated bot technology from StreamSets ingests data from thousands of operational systems. GSK then uses Trifacta software to clean up messy, complex data sets and render it into views business users are interested in analyzing. GSK also taps machine learning software from Tamr to move data into industry ontologies and AtScale software to virtualize the data. Business users view the data through Zoomdata visualization software. Google’s TensorFlow, Tibco Spotfire and Anaconda are among the other tools in the platform. Ramsey says the various technologies are integrated so that they can share data, which will make clinical trials easier.

As part of this project, the company has moved roughly 12 terabytes of structured data and nearly 8 petabytes of unstructured information into the platform in 11 months — fast for any enterprise let alone a pharma. “Even though GSK is over 300 years old, we’re trying to operate like a startup,” Ramsey explains.

Shrinking data discovery windows

The GSK Big Data Information Platform is already paying dividends, shrinking the time it takes to curate data for a clinical trial. Whereas it once took a year for researchers to mine clinical trials for links between blood types and the effectiveness of respirational medicines, today it takes 30 minutes. “It had a huge impact on productivity of the researchers,” Ramsey says.

GSK also recently inked a collaboration with UK Biobank to use its platform to conduct exome sequencing for 500,000 patients, helping the researchers analyze DNA traits linked to those characteristics, Ramsey says. “It’s driving huge value as it relates to the R&D process,” Ramsey says.

Ultimately, GSK hopes computer simulations conducted with its platform will help the company reduce the drug discovery period from five or seven to two years, Ramsey says.

Ramsey offered the following tips for companies seeking to get their analytics in order.

Conduct a holistic assessment: You first have to learn where the data is, what it is, and how to use it. When he joined GSK, Ramsey found an IT department that didn’t have a handle on the data. He had the team build crawler technology to find every source of data across R&D. Once you have your data environment assessed, you can build a data analytics team that can support those tasks, and begin to think about a platform to ingest, process and analyze the data.

Apply analytics technologies against the data first: Too often, companies lose sight of carefully rendering the data for analysis because they’re laser-focused on solving the business problem. Ramsey encourages his peers to focus on data curation, and to leverage machine learning tools available to understand the data you have. “Make sure you apply big data to the data itself to make this happen,” he says. “You don’t start driving value until you can curate it and make it available for business users.”

Bring the business with you: C-suite buy-in is critical, says Ramsey, who reports to the president of GSK’s R&D division. “Having a great platform is only half the equation; you have to have the people that are pushing the envelope and wanting to change the way they make decisions to drive value for the business,” he says.