Breaking through I/O Bottlenecks with Cumulus Data Acceleration

BrandPost By Janet Morss
Dec 04, 2018
AnalyticsBig DataHadoop

Credit: Dell EMC

For years, techies have wrestled with persistent storage I/O challenges. While data-processing power has raced forward at Moore’s Law speed, storage I/O capabilities have often lagged behind, creating a bottleneck that slows time to insight. This problem has made I/O performance a top technical concern for those running data-intensive workflows.

Today, a team has found a solution to this long-running storage I/O problem. That solution is the Dell Data Accelerator, incorporating technologies from Dell EMC, Intel and Cambridge University. The Data Accelerator is a key component of the University’s new Cumulus supercomputer.

As for that system, Cumulus provides more than 2 petaflops of performance, powered by Dell EMC PowerEdge™ servers, Intel® Xeon® processors and the Intel® Omni-path Architecture (Intel® OPA). The system incorporates OpenStack® software to control pools of compute, storage and networking resources and make them readily accessible to users via a cloud interface.

In addition to OpenStack-provisioned x86 and GPU bare-metal and virtualized hosts, Cumulus incorporates the Data Accelerator with an innovative orchestrator built by the University of Cambridge and StackHPC. This accelerator provides more than 500 GB/s of I/O read performance, which makes it the UK’s fastest HPC I/O platform, according to the Research Computing Service. The result is a single heterogeneous x86/GPU platform that provides the UK’s most advanced supercomputing cloud.

cambridgeaccelerator Dell EMC

The network-attached Data Accelerator at the University of Cambridge

The University of Cambridge is now leveraging the Data Accelerator and the Distributed Name Space (DNE) feature in the Lustre file system to continue to optimize the Cumulus cluster for top I/O performance. This optimization work has led to a huge leap forward in storage performance, according to Dr. Paul Calleja, the University’s Director of Research Computing Services.

In benchmark testing, the Cumulus system achieved a score of 158.7 on the IO-500 list, which ranked the system third on the current (November 2018) IO-500 list. For system users, these numbers equate to significant improvements in I/O performance for HPC workloads — and faster results.

“With DNE, the IOPS performance of this solution is amazing,” Dr. Calleja says. “The guys had to work around bugs and adjust Lustre parameters to get it to run, but now we have stable, repeatable and very high performance runs with no errors and determinant behavior, so I think we have cracked the HPC storage problem.”

For a closer and more technical look at the Data Accelerator, visit the Research Computing Services’ Data Accelerator site. And take a look at the case study.