Even though they have been around for years, the phrase “MLPerf benchmarks” holds little meaning to most people outside of the AI developer community. However, this community-driven benchmark suite, which measures performance of a broad range of machine learning (ML) tasks, is quickly becoming the gold standard for the fair and unbiased assessment of accelerated computing solutions for machine learning training, inference, and high performance computing (HPC).
The era of MLPerf is here, and everyone should be paying attention.
Organizations across every industry are racing to take advantage of AI and machine learning to improve their businesses. According to Karl Freund, founder and principal analyst at Cambrian AI Research, businesses should expect that customer demand for AI-accelerated outcomes will continue to grow.
“We foresee AI becoming endemic, present in every digital application in data centers, the edge, and consumer devices,” said Freund. “AI acceleration will soon no longer be an option. It will be required in every server, desktop, laptop, and mobile device.”
But, selecting the right solutions – ones that maximize energy efficiency, longevity, and scalability – can be difficult in the face of hundreds, if not thousands, of hardware, software, and networking options for accelerated computing systems.
With this rapid industry growth, coupled with the complexity of building a modern AI/ML workflow, leaders from both industry and academia have come together to create a fair, unbiased way to measure the performance of AI systems: MLPerf.
Administered by MLCommons, an industry consortium with over 100 members, MLPerf is used by hardware and software vendors to measure the performance of AI systems. And, because MLPerf’s mission is “to build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance under prescribed conditions, end customers can rely on these results to inform architectural choices for their AI systems.
MLPerf is also constantly evolving to represent the state of the art in AI, with regular updates to the networks and datasets, and a regular cadence of result publication.
MLPerf Benchmarks Deconstructed
Despite the numerous benefits, the results of the MLPerf benchmarking rounds have not garnered the attention that one might expect given the rapid industry-wide adoption of AI solutions. The reason for this is simple: Interpreting MLPerf results is difficult, requiring significant technical expertise to parse.
The results of each round of MLPerf are reported in multi-page spreadsheets and they include a deluge of hardware configuration information such as CPU type, the number of CPU sockets, accelerator type and count, and system memory capacity.
Yet, despite the complexity, the results contain critical insights that can help executives navigate the purchasing decisions that come with executing or growing an organization’s AI infrastructure.
To start, there are five distinct MLPerf benchmark suites: MLPerf Training, MLPerf Inference and MLPerf HPC, with additional categories of MLPerf Mobile and MLPerf Tiny also recently introduced. Each year, there are two submission rounds for MLPerf Training and MLPerf Inference, and a single round for MLPerf HPC.
The latest edition of MLPerf Training – MLPerf Training v1.1 – consists of eight benchmarks that represent many of the most common AI workloads, including recommender systems, natural language processing, reinforcement learning, computer vision, and others. The benchmark suite measures the time that is required to train these AI models; the faster that a new AI model can be trained, the more quickly it can be deployed to deliver business value.
After an AI model is trained, it needs to be put to work to make useful predictions. That’s the role of inference, and MLPerf Inference v1.1 consists of seven benchmarks that measure inference performance across a range of popular use cases, including natural language processing, speech-to-text, medical imaging, object detection, among others. The overall goal is to deliver performance insights for two common deployment situations: data center and edge.
And, finally, as HPC and AI are rapidly converging, MLPerf HPC is a suite of three use cases designed to measure AI training performance for models with applicability to scientific workloads, specifically astrophysics, climate science, and molecular dynamics.
Making Data-Driven Decisions
When making big-ticket technology investments, having reliable data is critical to arrive at a good decision. This can be challenging when many hardware vendors make performance claims without including sufficient details about the workload, hardware and software they used. MLPerf uses benchmarking best practices to present peer-reviewed, vetted and documented performance data on a wide variety of industry-standard workloads, where systems can be directly compared to see how they really stack up. MLPerf data from the benchmarks should be part of any platform evaluation process to remove performance and versatility guesswork from solution deployment decisions.
Learn More About AI and HPC From the Experts at NVIDIA GTC
Many topics related to MLPerf will be discussed – and NVIDIA partners involved in the benchmarks will also participate – at NVIDIA’s free, virtual GTC event, which takes place from March 21-24 and features more than 900 sessions with 1,400 speakers.
Top sessions include:
Accelerate Your AI and HPC Journey on Google Cloud (Presented by Google Cloud) [session S42583]
Setting HPC and Deep-learning Records in the Cloud with Azure [session S41640]
Overhauling NVIDIA nnU-Net for Top Performance on Medical Image Segmentation [session S41109]
Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference [session S41352]
How to Achieve Million-fold Speedups in Data Center Performance [session S41886]
A Deep Dive into the Latest HPC Software [session S41494]