When you’re dealing with petabytes of scientific data, getting it all into the cloud can take time. But once it’s there, running experiments on your data from anywhere in the world is simple — and all you need to download are your results. Convincing researchers to embrace this paradigm shift, though, can take the support of partners across the organization and beyond, as Keith Perry, senior vice president and CIO at St. Jude Children’s Research Hospital in Memphis, Tennessee, has learned.
St. Jude is working to map the genetics of childhood cancers as part of its search for cures. It uses genetic sequencing to determine the whole genome sequences — essentially all of the DNA — of its patients’ healthy cells, and also of their tumors. Comparing the two, and also the genomes of other patients with similar cancers, can provide vital clues to a cure.
That’s very much a big data problem. St. Jude has recorded the whole genome sequences of over 5,000 patients, and each sequence is around 100 gigabytes. Researchers may be hunting for mutations or other genetic markers in just a few bytes of that.
Because of the rarity of some of these childhood cancers, researchers around the world want to compare data — but it’s not easy given the volumes involved. Perry cites the example of a researcher who took six months to download a huge dataset and check it for quality, then just a couple of days to run the analysis.
“The industry still struggles with the downloading of data. It’s not reached the tipping point where people are just running to the cloud to do compute,” he says. “There’s still that mentality of ‘I’ve got to download the data,’ though we’re slowly chipping away at that perception.”
That’s one reason the hospital has built the St. Jude Cloud, a platform where its researchers — and others — can host their data and run experiments.
Another stems from the hospital’s funding model. “We’re a charity,” Perry says. “Our No. 1 focus is finding cures and saving children, and we take that very seriously. … We cannot think of this data that we’re producing or the concepts that we’re producing as our own. We’re just stewards of knowledge that’s being generated.”
Rather than build a data sharing platform from scratch, the hospital chose to work with DNAnexus, which specializes in the field. It then went hunting for a cloud provider, eventually settling on Microsoft’s Azure platform.
“What we wanted to do in terms of leveraging cloud technology [was] to create an ecosystem of sharing that wasn’t just built upon a file-sharing mechanism, which we already see in the research industry. It’s more around building tools and compute power on top of the data that’s sitting there,” Perry says.
The cloud project, which earned St. Jude a 2019 Digital Edge 50 Award for digital innovation, would not have happened without Jinghui Zhang, chair of the hospital’s department of computational microbiology, and her team, Perry says.
Zhang’s department writes software that takes the data from the genomic sequencer and processes it to help researchers understand the characteristics of the genome and any mutations it contains. “The tools she has developed and that her research team has developed are really being used across the globe,” he says.
The IT and computational microbiology departments worked together to build St. Jude Cloud.
“They could have gone and developed a framework for cloud computing without IT, and we would have been frustrated at that,” Perry said. However, the IT department was able to demonstrate the value of working together. “We were able to help them port all their data to the Azure cloud; that would have taken them a significant amount of time.” Even so, it took Perry’s networking team several months to upload and quality-check the data.
And then there’s the code optimization: Once the computational biology team have developed a tool, IT staffers are on hand to speed it up. For those roles, Perry typically looks for candidates with a PhD in computational sciences and a focus on high-performance computing.
“These are people that have really dedicated their career to understanding how a high-performance computer works, and what’s the best way to optimize code on one,” he says. “They’re not easy to find, either.”
The coders initially target their software at St. Jude’s in-house research cluster, which has 6,400 compute cores. Once it’s ready for production, or to be shared with others, then it’s ported to the cloud.
Given the rarity of PhDs in HPC to start with, internal training has been a key part of the move to the cloud. “As we’ve created positions we’ve brought in a few people from the outside, but mostly we want people that have been engaged in our mission first, and then we can teach them how to [make] the shift in the industry in terms of cloud computing,” Perry says. “It’s a different mindset.”
Other IT contributions include working information security into the project at an early stage to protect the patient data, and bringing the internet design team to the table to help design the St. Jude Cloud portal.
“We’ve integrated the cloud computing paradigm, if you will, into our normal information security program,” he says. “We have an external company that does penetration testing for us, and we’ve turned them loose on the cloud infrastructure.”
How you deal with the findings of such tests is important, he says: “It’s not a punitive exercise. It’s, ‘Can we uncover something, a gap in coverage, whether that be a process or a technology that we need to shore up?’ So it’s a really useful exercise for us.”
Spreading the vision
On the design side, one of the more interesting collaborations was with St. Jude’s own marketing and communications team.
“I’ve learned in my career that IS or IT is really not good at marketing what they do,” Perry says. But for St. Jude Cloud to be a success, the team had to convince not just internal users, but users at other research institutions to trust it with their data.
Perry made sure the marketing team was involved in the project early on. “They helped us coming up with concise messages, and getting people engaged at the various conferences the teams went to,” he says. “They also had the opportunity to weigh in on some of the design characteristics and flow. Our marketing communications group is fantastic because they understand the research community as well.”
One way in which St. Jude Cloud won over external users was by focusing on delivering value, not just on delivering a message, he says. “When we opened it up last year, there was already very rich data, a very large sequencing dataset associated with pediatrics, and we had plans to grow that. What we didn’t do was come out and say, ‘We’re going to build this.’ We came out and said, ‘Hey, we built this and it’s ready.’”
While Perry wants recognition for the value that the IT team brings, he sees the flip side too: “We just have to recognize as an industry that all the value is not within IT, there’s value outside and it’s a partnership.”