Years ago, people in IT shops talked a lot about server sprawl stemming from the continual deployment of servers dedicated to single applications. While in many cases these servers were woefully under-utilized, IT teams couldn’t easily share the resources with other applications that were hungry for more processing power. And then came server virtualization, which made it easy to share server capacity among multiple applications, and that helped solve the problem of server sprawl.
Today, organizations are wrestling with a similar problem but on a larger scale. That problem is cluster sprawl, which stems from the deployment of high performance computing systems dedicated to particular compute-intensive applications used in different areas, such as those for data analytics, machine learning and engineering simulations. Many organizations now find themselves with many islands of HPC that are under-utilized and difficult to manage in a unified manner.
The Gabriel Consulting Group recently weighed in on this issue in a report on a new software toolkit designed to make HPC resources easier to share and manage.
“We find many organizations suffering from ‘cluster sprawl’ with smaller clusters dedicated to one or only a few workloads,” the firm notes. “Today, many organizations are purchasing AI-centric clusters because they think that these applications need dedicated hardware. However, these clusters often end up with lower utilization and become compute silos in the infrastructure. Why not combine these systems into a larger whole that can be used by all?”
And this brings us to the new Omni software suite, a toolkit that gives data centers a way to pull all of their compute into a single, highly usable, and more easily managed pool of resources.
Omnia at a glance
Omnia was developed at the Dell Technologies HPC & AI Innovation Lab, in collaboration with Intel and with support from the HPC community. This open-source software is designed to automate the provisioning and management of HPC, AI and data analytics workloads to create a single pool of flexible resources to meet growing and diverse demands.
The Omnia software stack includes an open source set of Ansible playbooks that speed the deployment of converged workloads with Kubernetes and Slurm, along with library frameworks, services and applications. Omnia automatically imprints a software solution onto each server based on the use case — for example, HPC simulations, neural networks for AI, or in‑memory graphics processing for data analytics — to reduce deployment time from weeks to minutes.
Community involvement and contribution are important for Omnia’s advancement. To that end, Arizona State University Research Computing has worked closely with the Dell Technologies HPC & AI Innovation Lab on Omnia development to better support mixed workloads, including simulation, high throughput computing and machine learning.
A third-party review
In its evaluation of the Omnia platform, the Gabriel Consulting Group called out a wide range of capabilities and benefits in the software. These include support for:
Resource pooling — “Customers using Omnia can rapidly deploy HPC or AI clusters that are ready for users to populate with the application stacks they need,” the firm notes. “With Omnia, all of an organization’s compute resources can be pulled together to create a single infrastructure pool that can then be parceled up to run workloads.”
Custom clusters — “Omnia allows customers to dynamically split up large clusters into logical clusters that are tailor-made for workload specific configurations,” Gabriel says. “Clusters can be built, torn down, merged with other hardware resources, quickly and easily.”
Large numbers of clusters — “Omnia can be used to create and manage server groups that range from a single system all the way up to all of the systems in the data center,” the firm points out. “To us, it’s most valuable when it’s used to deploy large numbers of uniquely configured clusters and stacks for lots of users.”
The bottom line
In the wrap-up to its report, Gabriel Consulting Group says that Omnia just might be the answer to the challenges of creating and managing ever larger and more complex compute infrastructures and the growing problem of cluster sprawl.
“It checks all the boxes when it comes to rationalizing and virtualizing HPC infrastructures,” the firm notes. “It makes building and getting real use out of compute clusters quicker and easier, and it reduces workload on administrators and users alike. These are all big wins in our book.”
For a deeper dive into the Omnia platform, see the Gabriel Consulting Group report “Omnia Fights Cluster Sprawl.” And to get started with Omnia, download the software from GitHub at https://github.com/dellhpc/omnia — and then join the community that is helping to guide the design and development of the next generation of open-source consolidated cluster deployment tools.