Architecting for AI: Overcoming System Bottlenecks

To achieve top performance for new data-intensive workloads, IT organizations need to overcome network and storage bottlenecks.

shutterstock 715122997 1280x12801
Dell EMC

For today’s IT organizations, data analytics, artificial intelligence and high performance computing no longer live in separate worlds. These complementary technologies are rapidly converging as private and public enterprises work to gain greater value from the data they capture and store. In this new data-driven world, performance is king, because many applications now need to generate insights and automated responses in real time — within milliseconds in some cases. This need for speed is forcing organizations to take a fresh look at IT architectures with an eye toward removing bottlenecks and reducing latency in system responses.

The need for better IT architectures is particularly important when it comes to storage and networking. While data-processing power and disk speeds have surged ahead in recent years, storage I/O and network bandwidth limitations have often created bottlenecks that slow system responsiveness and time to insight. This is a big problem for AI applications — including machine and deep learning workloads — that interact continuously with data storage systems.

To remove these bottlenecks, and to accelerate system performance, organizations are getting creative in their architectural approaches. This is the case at the University of Cambridge and the University of Pisa, two Dell EMC customers that are perennial leaders in the use of HPC systems.

Cumulus system

The University of Cambridge’s latest supercomputer, called Cumulus, is designed to serve as a single cluster that supports data analytics, machine learning and large-scale data processing. A key objective in the design of the system was to make the infrastructure perform well for diverse, data-intensive research workloads. With these thoughts in mind, the Cumulus infrastructure was based on Dell EMC™ PowerEdge™ servers and Intel Xeon Scalable processors, all connected via the Intel® Omni-Path Architecture (Intel® OPA).

While it has all the right stuff, this robust architectural foundation alone doesn’t necessarily solve today’s persistent I/O cluster bottlenecks. The Cumulus system removes these bottlenecks with a unique solution called the Data Accelerator (aka DAC), which is designed into the network topology. DAC incorporates technologies from Dell EMC, Intel and Cambridge University. In this architecture, the DAC nodes work in conjunction with the Distributed Name Space (DNE) feature in the Lustre file system and Intel® Omni-Path switches to accelerate system I/O.

The results of this accelerated architecture have been rather amazing. With DAC under the hood, Cumulus provides more than 500 GB/s of I/O read performance, which makes it the UK’s fastest HPC I/O platform, according to the university’s Research Computing Service, which operates the Cumulus cluster.1 In benchmark testing, the Cumulus system achieved an IO-500 score of 158.7, which ranked the system third on the November 2018 IO-500 list. For system users, these numbers equate to big improvements in I/O performance for data-intensive and AI workloads — and faster time to insight.

Storage Spaces Direct

The IT team at the University of Pisa is leveraging a unique network architecture to improve the performance of its Storage Spaces Direct environment, which incorporates lightning-fast NVMe drives. The challenge is to make the network move data as fast as the NVMe drives.

“The network has become again the bottleneck of a system, mostly because of NVMe drives,” Antonio Cisternino, the university’s chief information officer, notes in a Dell EMC case study. “Four NVMe drives, aggregated, are capable of generating around 11 gigabits per second of bandwidth, which tops a 100-gigabit connection. They may saturate and block I/O with just four drives.”2

To get around this bottleneck, the IT pros at the University of Pisa used Dell EMC S5048-ON switches to build what amounts to a bigger highway in their Storage Spaces Direct environment. A spine-leaf network design gives every server access to two lanes of 25Gb RoCE — RDMA over Converged Ethernet — to move data in and out of the NVMe drives. This design results in an aggregate bandwidth of 50Gb/sec, which helps ensure that the network won’t be much of a bottleneck in the system.

Key takeaways

The rise of artificial intelligence and real-time data analytics creates unprecedented opportunities for today’s enterprises. To fully capitalize on these opportunities, IT organizations need a scalable architecture that incorporates the latest processor and fabric technologies, accommodates massive amounts of data, and removes storage I/O and network bottlenecks. These are among the keys to capitalizing more fully on new data analytics and AI applications.

To learn more

For a broader look at the capabilities of the University of Cambridge’s Cumulus cluster, read the Dell EMC case study “UK Science Cloud” and visit the Data Accelerator site. For a closer look at the University of Pisa’s Storage Spaces Direct environment, read the Dell EMC case study “Storage Success.”


1 Dell EMC case study, “UK Science Cloud,” November 2018.

2 Dell EMC case study, “Storage Success,” June 2018.

Copyright © 2019 IDG Communications, Inc.