It’s hard to argue that any area of technology is evolving as rapidly as machine learning (ML) is nowadays. In a previous blog, I theorized that artificial intelligence (AI) was at an inflection point in its maturity, where it could benefit from many of the lessons learned in both the hardware and software realms of high performance computing (HPC). Since then, AI frameworks and systems architecture have come a long way, and much of what I mentioned in my earlier blog is now almost a given.
So, on the surface at least, it does appear that these two disciplines are coming together. Or are they? What I would like to explore in this post is whether there is indeed any substance to the argument and why, for the most part, this is inevitable.
It always helps to examine HPC and AI convergence in the context of current trends we are seeing in the marketplace and with our customers. Yet, more importantly, what are the drivers of this convergence? Is it a natural fit? Are today’s users truly striving for a happy medium that incorporates elements from both technologies to make their lives easier?
Another thing to consider is that HPC was mostly born in academic circles where economic considerations are not primary factors. I have many colleagues in academia who will spend several months optimizing code to see a modest gain in performance. Usually, this is worth it because, once optimized, this code will be executed many millions of times, and modest gains become quite substantial. For enterprise users, however, time is money – literally – so they tend to favor ease-of-use, and they simply don’t have the luxury to fine-tune anything except the most critical of codes. The model here is geared more toward flexibility.
Let’s take a look at some general trends to help us form a better picture:
Trend 1: HPC systems and applications are designed to scale out.
We could argue that even the pinnacle of technology, supercomputers, were not immune to the advances of commoditization. Vector-based monolithic scale-up machines began their slow march toward obsolesce as commodity processors from the likes of Intel and AMD in clustered machines started offering decent performance at a fraction of their bigger brethren. When this trend started in the 1990s, the software for clustering, like MPI or OpenMP, still had a way to go. However, users quickly caught on to the benefits of this modular approach and, once many of the popular codes were ported to clusters and running reliably, there was no turning back. So, it should come as no surprise that HPC users have amassed a wealth of knowledge when it comes to taking a workload and converting it to run on very large scale-out clusters. ML software is very much where HPC clusters were in the 90s; the need to scale is apparent, but the software isn’t at a stage where this is an easy process. More on this in a moment.
Trend 2: HPC systems are generating more data, and AI is helping us make sense of the data.
Traditionally, many workloads run on HPC systems where identical mathematical models are replicated many thousands or millions of times, hence the need for the speed of supercomputers. Think of weather forecasting as simulating the behavior of millions of 3D cells representing the atmosphere. Even though the mathematical equations representing the model required painstaking work, the input data required to kick off the simulation was relatively small, because each iteration of the model would generate intermediate data that would then flow through the simulation.
The intermediate data can be so enormous that many folks in HPC circles joke about this being the original “big data” from decades ago, long before the term became common parlance. What’s becoming evident is that, as supercomputers get bigger, so do the simulation models and, consequently, the data they generate – so much so that it’s humanly impossible to understand all the data in a meaningful way.
AI techniques are increasingly being used to digest this enormity of data and transform it into more human-friendly formats. In theory, you could do this AI processing as a separate operation on a separate system custom-built for AI workloads. However, moving tens of petabytes of data is no trivial matter, and takes a considerable amount of time. So, it’s a matter of practicality for the AI to run on the same system that generated the data, which in this case happens to be an HPC system.
Trend 3: AI-based models can sometimes replace computationally intensive tasks in simulation.
Other than just analyzing simulation data, some scientists are taking it a step further and using AI-based models to replace parts of a simulation that are otherwise too computationally expensive. In an example directly from high energy physics, scientists are trying to replace Monte Carlo methods with generative adversarial networks (GANs), an unsupervised learning technique where two neural networks compete to produce more accurate results, in order to accelerate their ability to study collision data from high-granularity calorimeters that measure particle energy. In this case, the AI code is able to produce similar results as the numerically intensive code, but uses only a fraction of the compute cycles, thereby enabling both bigger and faster models.
Trend 4: Enterprises are struggling with a tsunami of data, and IoT is going to make it worse.
Even enterprises aren’t immune to this data deluge. Many are struggling to keep up with a mountain of data that is generated daily. After many years of ignoring the data, enterprise data scientists began to unlock some of its secrets with the help of data analytics techniques. However, the methods were too reliant on human data-interpreting abilities, which can be the bottleneck.
With modern state-of-the-art AI techniques, human judgement is still indispensable. However, these techniques are markedly superior at summarizing ever-larger quantities of data, thereby improving the efficiency of the human decision maker. This AI-enhanced efficiency will become even more critical with the influx of data from the Internet of Things (IoT), which will exponentially exacerbate the problem as AI-enabled smart things supply new ways to unlock the economic potential of consumer data.
Trend 5: Momentum from HPC and data analytics convergence is leading the way for AI convergence.
Not too long ago, folks were asking whether HPC and data analytics were on a path of convergence and, arguably, we’ve seen synergies from the coming together of these two different compute-centric and data-centric paradigms. In my opinion, many of the barriers to HPC becoming mainstream were brought down when this happened. Now with AI, even though we have practitioners on either side of the fence, there is more overlap than ever in terms of the tools, workflows and, most interestingly, outcomes.
For example, image recognition relies on similar techniques regardless of use case, be it detecting tumors in X-rays or merchandise matching at a retailer. Common use cases like these borrow techniques from both legacy HPC and AI, and they are bringing the two disciplines together.
Trend 6: IT managers are moving away from disparate, custom architectures to general-purpose ones that can do it all.
Just as virtualization changed the way datacenters were created and managed, IT managers are shying away from custom architectures that specialize in a narrow range of workloads in favor of architectures that handle a diverse, complex workflow. Think hyper-converged infrastructure (HCI), where the system architecture is flexible enough to allow you to run HPC, data analytics, and machine and deep learning (DL) on the same system, all as part of a complex workflow that minimizes data movement and, hence, speeds up processing. Even though no one likes to mention it, more than 80 percent of processing time is spent in preparing the data, irrespective of the workflow. So, ideally, having a general-purpose platform that can be used for data preparation is a huge plus.
And anytime you’re thinking of data, data-gravity should also be a consideration. You want the data to be processed as close to where it was either produced or last deposited as possible.
Trend 7: AI systems architectures are beginning to scale out.
AI today is mostly comprised of two major areas: machine learning, which for the most part is statistical models on steroids, and deep learning, which is roughly based on the model of a biological neuron and its learning capability. It’s somewhat debatable, but ML has had a greater adoption with enterprise users who are already familiar with the analytical techniques ML is based upon. DL, on the other hand – due to its reliance on large sparse matrices and numerical algorithms – has always had more in common with traditional HPC users. Regardless of provenance of each of these techniques, almost all AI implementations consist of two steps:
- Training – where the model uses a learning technique to capture the “signal” in the input data set. Think of this as either tagged images of cats (the signal) in supervised learning, or just images of cats in unsupervised learning. Either way, the result is a highly optimized model that recognizes cats. This step is extremely data-intensive – think thousands to millions of images of cats to improve model accuracy. Each one of the images requires lots of intermediate computations on very large matrixes. No surprise here that it requires lots of computational horsepower and fast memory. Even then, this step can take days to complete.
- Inferencing – taking the model output from the earlier training stage and using it to do useful work, like analyzing CCTV images to alert you when your neighbor’s cat (or any cat for that matter) is rummaging in your garden. As opposed to training, inferencing requires many orders of magnitude less compute resources, and many mobile devices today are powerful enough to do inferencing.
Although inferencing is increasingly where all the interesting decision making happens, it’s the training part that gets the most attention today. And rightly so, since it’s the part that takes the longest. Training got a huge bump in performance when training codes were first ported to graphical processing units (GPUs). The codes initially could only run on one GPU in a system. With time, users managed to get the codes running on multiple GPUs in a single system. However, you quickly run into the limit of how many GPUs a single node has. Currently, the limit is 16 GPUs (although some systems do have 32, they’re not very common). If you want to run bigger training jobs, you have two options:
- Use nodes with 4 or 8 GPUs as building blocks to build a scale-out system, just like what HPC users did in the 90s with compute
- Do away with GPUs altogether, and use a very large cluster of CPUs (like a very large HPC cluster)
We’re assuming here that the training frameworks scale nicely in scale-out architecture. This, sadly, isn’t the case today. Training is a complex operation, and the order of training data can have a substantial effect on output. Normally, this isn’t a problem when all the training data is sequentially proceeded by the same system.
However, distributed processing will break that order in favor of speed. Parallelizing in general is not a trivial task, and it is even more complicated when it comes to neural codes. Lots of research is being carried out on distributed training, and many researchers are finding ways of circumventing this problem – a challenge that is akin to many of the early hurdles faced in parallelizing HPC applications that were eventually overcome with better software frameworks. It will be the case with parallel AI as well, but we’ll get there eventually.
These seven trends give some indication of how HPC and AI are being used together today, and how both disciplines have much to offer and learn from each other. Each side has its strengths, which in many cases are complementary and are giving users unprecedented capabilities.
As any new technology like AI evolves and matures, innovative minds start to use it in ways previously unthought of. Clearly, this is what we see today, especially with users on the HPC side. Even enterprise users are looking toward complex numerical algorithms, because in some cases the problems themselves bear little distinction from those that have been addressed by HPC users in the past.
In my mind, there is no doubt that HPC and AI are converging, and I can only imagine that the combination of the two is going to be greater than the sum of the parts for sure.
To learn more
You can find more information on topics referenced in this blog at the following links:
To learn more about unlocking the value of data with artificial intelligence systems, explore Dell EMC AI Solutions and Dell EMC HPC Solutions.