Deep Learning stands to benefit from data analytics and High Performance Computing (HPC) expertise

highway traffic lights
Dell EMC

As I noted in a February blog post, many enterprises today need solutions that couple high-performance computing with data analytics. This convergence of technologies is blurring the boundaries between HPC and big data, and clearing the way forward for the advent of high-performance data analytics (HPDA). In a parallel trend, enterprises increasingly need solutions that merge technologies for machine learning and deep learning — a need I will explore more deeply in today’s post.

The rise of machine learning

Machine learning was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks. Researchers interested in artificial intelligence (AI) wanted to see if computers could learn from data and the process of iterative training on new data sets.  This aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results. It’s a science that’s not new, but gaining fresh momentum.

Another area where we lack clear-cut distinctions is in AI with machine learning and deep learning. To be clear, deep learning is a type of machine learning. However, while both are part of the same family, they have some fundamental differences that make each technique better suited for certain applications. A question I frequently hear asks: Is machine learning restricted to big data applications, whereas deep learning mostly for HPC users? Before we answer this question, let’s consider how we got here.

Back in the 1980s, we were excited about the prospect of using neural networks with many layers of hidden neurons. Adding additional layers is natural from a biological “human neural net” point of view, and also, it is obvious that multiple layers are needed to represent complex compositional concepts joined together in a reasonable way (e.g., “a chair is made of several legs, a seat and a back”). But it was hard to get networks with many layers to learn well using standard backpropagation techniques.

In the 1990s and 2000s, neural nets fell somewhat out of favor, as other approaches, stemming from statistical machine learning, started having better results. Then, in the past few years, a method of training networks with multiple hidden layers, or “deep nets,” was developed. With this method, individual hidden layers are added one-by-one, using an auto-encoder approach to do unsupervised “pre-training” of the weights of the connections between successive hidden layers. After this initialization, a regular backpropagation step is run for “fine-tuning.” The point of the pre-training step is that the back-propagation typically converges much more quickly.

Finally, in the 2010s, we now have much higher computing power than in the 1980s, primarily due to the emergence of GPU computing. GPUs give many orders of magnitude more computing power, and the matrix operations they do well are happily very convenient for running back propagation computations in neural nets.

Where we are today

These days, “deep learning” isn’t actually very different from the back-propagation neural nets of the early days. However, combining good weight regularization/dropout with the raw compute power of a GPU makes neural networks with many hidden layers converge in a reasonable amount of time, something that wouldn’t have been possible with the technologies of the 1980s.

Figure 1:

image 1 Dell EMC

Figure 1: Courtesy of NVIDIA®

While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data — over and over, faster and faster — is a recent development. Here are a few widely publicized examples of machine learning applications that you may be familiar with:

  • The heavily hyped self-driving cars? The current epitome of machine learning
  • Online recommendation offers, such as those from Amazon and Netflix? Machine learning applications for everyday life
  • Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation
  • Fraud detection? One of the more obvious and most important uses in our world today.

Machine learning vs. deep learning

Let’s take a closer look at the differences in machine learning and deep learning. Machine learning refers to the development and use of algorithms that improve their performance at some task based on experience, or previous iterations with a dataset. Deep learning is a subset of machine learning. It refers to algorithms where multiple layers of neurons learn successively complex representations.

At a high level, both machine learning and deep learning appear to produce the same outcome, i.e. generating actionable insights from vast mountains of data; however, the way they do this is very different. One difference, historically speaking at least, is that machine learning has its roots in the reporting and analytics world in the enterprise, whereas neural networks, which form the basis of deep learning, were originally developed in academia over three decades ago.

Figure 2:

image 2 Del EMC

Figure 2: Courtesy of Intel®

Machine learning in the enterprise has its roots in reporting and advanced analytics. It delivers on the promise of extracting value from big and disparate data sources with far less reliance on human direction. It is data-driven and runs at machine scale. It is well suited to the complexity of dealing with disparate data sources and the huge variety of variables and amounts of data involved. And unlike traditional analysis, machine learning thrives on growing datasets. The more data fed into a machine learning system, the more it can learn and apply the results for higher quality insights.

Machine learning is now used widely in the enterprise space in conjunction with data analytics platforms like Apache™ Hadoop® and Splunk® Enterprise. The algorithms used in machine learning span the range of statistical techniques, from regression and decision trees to Bayesian data analysis and clustering a few of many techniques, and also include artificial neural networks.

The applications are many and varied. For example, today’s enterprises use machine learning to:

  • Extract value from massive amounts of data from connected devices and the Internet of Things
  • Drive predictive analytics applications that suggest other products that might be of interest to a particular customer
  • Extend the reach of the Splunk platform with the addition of predictive analytics capabilities

Deep learning — initially used widely in academic research initiatives and now quickly gaining commercial applications — leverages neural networks and a great deal of computational power for activities like natural language processing and image object recognition. Deep learning methods are a modern update to artificial neural networks that exploit abundant inexpensive computation. They are concerned with building much larger and more complex neural networks, and many methods are concerned with semi-supervised learning problems where large datasets contain very little labeled data.

Deep learning emphasizes the kind of model you might want to use (e.g., a deep convolutional multi-layer neural network) where you can use data to fill in the missing parameters. But with deep-learning comes great responsibility. Because you are starting with a model of the world that has a high dimensionality, you need a tremendous amount of data (big data) and a great deal of crunching power (GPUs). Convolutions are used extensively in deep learning (especially computer vision applications), and the architectures are far from shallow.

The enterprise perspective

It shouldn’t be surprising that machine learning and deep learning could have very similar use cases; however, what does this mean from the perspective of the enterprise that has just adopted big data?

Today, machine learning and deep learning are on the path to convergence in enterprise environments because, increasingly, enterprises need to enhance machine learning applications with the deeper, richer capabilities of deep learning.

For example, a global retailer put itself on the path to convergence when it recognized the need to enrich its machine learning shopping assistance applications with deep learning image recognition capabilities. It wanted to give its shoppers the ability to take a picture of an object then ask questions about its availability and its features, and that takes the capabilities of deep learning algorithms.

Machine learning technologies are good at understanding what the particular shopper has bought in the past, what the shopper is looking at right now, and what other products the shopper might be interested in, but they can’t recognize what’s in a photo. Image recognition requires a great deal of compute and a large amount of training of models. The algorithms aren’t written to understand everything in an image. They must be trained to do that, and this is where deep learning can enable deeper insights.

Back to the big question

This brings us back to a question posed earlier: Is there any merit to the argument that says machine learning is in the domain of big data, whereas deep learning is the domain of HPC? The answer to that question is a simple no. Machine learning and deep learning are just different techniques, and they do have different platform requirements and excel at different tasks

Machine learning on big data can scale out thanks to the maturity of the data handling platforms. Today, Deep learning codes address only the computational portion with GPUs, and run best within a single compute node, and struggle scaling across nodes. However, as deep learning algorithms and computational models mature, this will change. Even though GPUs are a step forward for deep learning, it’s still insufficient for what’s needed for real world deployments.

Computationally speaking, today’s platforms are only beginning to scratch the surface of what’s needed for practical solutions. The next generation of hardware will need faster interconnects to scale across multiple processing units, faster and more memory to deal with the ever-increasing terabytes of data used for training, and have the ability to work with low or flexible floating point precision.

The software will have to be redesigned to scale out more efficiently, and with even better matrix multiplication algorithms. We would also need more efficient distributed storage and data access file systems because ultimately, that’s where all the value lies. And lastly, to be truly useful, a complete solution would be resilient against component failures and be easy to program and maintain. And when you look at all of the requirements, it reads like a typical laundry list of technologies where HPC has a lot of history and lots to offer.

Looking ahead

Today, we’re not far away from the point at which machine learning and deep learning will take place on the same big data platform. The enabling technologies are converging, and the business need is growing. New platforms will include all the data management features of big data, include highly specialized processors optimized for deep learning, and include high-speed intra and inter-connects.

This convergence will be further enabled by the ongoing evolution of big data platforms like Apache Hadoop and data processing engines like Apache Spark™. A prime example of this is the creation of platforms like Deeplearning4j, the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs.

Ultimately, it’s going to be the creativity of the community, coupled with innovative business use cases, that drives the convergence of the various automated learning platforms including that of big data and HPC. But then again, I like to think of all of these techniques as enriching the library of options that future analysts will have at their disposal — to enrich the lives of everyday people in ways that, until now, were all but unimaginable.

Adnan Khaleel is theGlobal Sales Strategist, HPC and Data Analytics at Dell EMC.