Genomic Science Breakthroughs Are Happening Faster Than Ever Thanks to HPC

Oct 05, 2022
High-Performance Computing

Scientists at the Sanger Institute rely on cutting-edge high-performance computing systems to power genome research.

Credit: Dell

Since the premier of the wildly popular 1993 dinosaur cloning film Jurassic Park, the sciences featured in the film, genetic engineering and genomics, have advanced at breathtaking rates. When the film was released, the Human Genome Project was already working on sequencing the entire human genome for the first time. They completed the project in 2003 after 13 years and at a cost of $1 billion. Today, the human genome can be sequenced in less than a day and at a cost of less than $1,000.

One leading genomics research organization, The Wellcome Sanger Institute in England, is on a mission to improve the health of all humans by developing a comprehensive understanding of the 23 chromosomes in the human body. They’re relying on cutting edge technology to operate at incredible speed and scale, including reading and analyzing an average of 40 trillion DNA base pairs a day.

Alongside advances in DNA sequencing techniques and computational biology, high-performance computing (HPC) is at the heart of the advances in genomic research. Powerful HPC helps researchers process large-scale sequencing data to solve complex computing problems and perform intensive computing operations across massive resources.

Genomics at Scale

Genomics is the study of an organism’s genes or genome. From curing cancer and combatting COVID-19 to better understanding human, parasite, and microbe evolution and cellular growth, the science of genomics is booming. The global genomics market is projected to grow to $94.65 billion by 2028 from $27.81 billion in 2021, according to Fortune Business Insights. Enabling this growth is a HPC environment that is contributing daily to a greater understanding of our biology, helping to accelerate the production of vaccines and other approaches to health around the world.

Using HPC resources and math techniques known as bioinformatics, genomics researchers analyze enormous amounts of DNA sequence data to find variations and mutations that affect health, disease, and drug response. The ability to search through the approximately 3 billion units of DNA across 23,000 genes in a human genome, for example, requires massive amounts of compute, storage, and networking resources.

After sequencing, billions of data points must be analyzed to look for things like mutations and variations in viruses. Computational biologists use pattern-matching algorithms, mathematical models, image processing, and other techniques to obtain meaning from this genomic data.

A Genomic Powerhouse

At the Sanger Institute, scientific research is happening at the intersection of genomics and HPC informatics. Scientists at the Institute tackle some of the most difficult challenges in genomic research to fuel scientific discoveries and push the boundaries of our understanding of human biology and pathogens. Among many other projects, the Institute’s Tree of Life program explores the diversity of complex organisms found in the UK through sequencing and cellular technologies. Scientists are also creating a reference map of the different types of human cells.

Science on the scale of that conducted at the Sanger Institute requires access to massive amounts of data processing power. The Institute’s Informatics Support Group (ISG) helps meet this need by providing high performance computing environments for Sanger’s scientific research teams. The ISG team provides support, architecture design and development services for the Sanger Institute’s traditional HPC environment and an expansive OpenStack private cloud compute infrastructure, among other HPC resources.

Responding to a Global Health Crisis

During the COVID-19 pandemic, the Institute started working closely with public health agencies in the UK and academic partners to sequence and analyze the SARS-COV-2 virus as it evolved and spread. The work has been used to inform public health measures and to help save lives.

As of September 2022, over 2.2 million coronavirus genomes have been sequenced at Wellcome Sanger. They are immediately made available to researchers around the world for analysis. Mutations that affect the virus’s spike protein, which it uses to bind to and enter human cells, are of particular interest and the target of current vaccines. Genomic data is used by scientists with other information to ascertain which mutations may affect the virus’s ability to transmit, cause disease, or evade the immune response.

Society’s greater understanding of genomics, and the informatics that goes with it, has accelerated the development of vaccines and our ability to respond to disease in a way that’s never been possible before. Along the way, the world is witnessing firsthand the amazing power of genomic science.

Read more about genomics, informatics, and HPC in this white paper and case study of the Wellcome Sanger Institute.


Intel® Technologies Move Analytics Forward

Data analytics is the key to unlocking the most value you can extract from data across your organization. To create a productive, cost-effective analytics strategy that gets results, you need high performance hardware that’s optimized to work with the software you use.

Modern data analytics spans a range of technologies, from dedicated analytics platforms and databases to deep learning and artificial intelligence (AI). Just starting out with analytics? Ready to evolve your analytics strategy or improve your data quality? There’s always room to grow, and Intel is ready to help. With a deep ecosystem of analytics technologies and partners, Intel accelerates the efforts of data scientists, analysts, and developers in every industry. Find out more about Intel advanced analytics.