By Adnan Khaleel
The convergence of high performance computing (HPC) and big data has been under way for years. As I noted in an earlier blog, HPC and big data grew up in different worlds and are now coming together—due to necessity. People using HPC applications often work with big data, and people working with big data often need the processing power of HPC systems. This convergence is giving rise to the era of high performance data analytics (HPDA) in the enterprise.
Let’s take a step back. For enterprises, data is coming at much faster rates than anyone had expected. Whether it’s from the Internet of Things, webpages, commercial transactions or other sources, the amount of data pouring into enterprise data centers exceeds current storage capacity. This flood of data creates a new class of data consolidation, data handling and data management challenges. Organizations can’t just let the data pile up. They now need to make deliberate decisions about what data to store, what data to analyze and what data to discard.
Above all, enterprises need to find ways to turn the flood of data into meaningful insights. This process increasingly requires HPC capabilities that make applications run as fast as possible. In many cases, enterprises need to generate insights in real time—whether they need to optimize the performance of remote equipment, respond faster to a customer’s needs or put the brakes on a potentially fraudulent transaction.
Let’s take the example of the many enterprises that are getting hit with an ever-growing wave of data from our world of connected devices, the Internet of Things (IoT). To capitalize on this data, whether in real time or over a period of time, enterprises need to apply sophisticated machine learning and deep learning techniques, and these techniques require HPC systems paired with big data platforms and data analytics tools.
With HPDA, enterprises use HPC technologies to analyze big data for rapid insights, real-time results and predictive analytics. One study found that 67 percent of HPC users are already doing HPDA, in addition to or instead of traditional HPC.
While HPDA is needed in traditional research-driven applications of HPC, it is becoming a must-have in enterprise environments. Depending on the industry, an enterprise might need to leverage data-centric HPC platforms for more traditional HPC applications like genomics, financial modeling and signal processing, as well as new and emerging HPDA applications like personalized medicine, fraud detection and machine learning.
The rise of new tools and technologies
For organizations that need HPDA, there is good news on the technology front: The tools and technologies for merging HPC with data analytics are maturing rapidly. Better still, HPC and big data platforms are converging in a manner that reduces the need to move data back and forth between HPC and storage environments. This convergence helps organizations avoid a great deal of overhead and latency that comes with disparate systems.
Today, organizations can choose from a rapidly growing range of tools and technologies like streaming analytics, graph analytics, and exploratory data analysis in HPC environments. Let’s take a brief look at these tools.
- Streaming analytics offers new algorithms and approaches to help organizations rapidly analyze high-bandwidth, high-throughput streaming data. These advances enable solutions for emerging graph patterns, data fusion and compression, and massive-scale network analysis.
- Graph analytics technologies enable graph modeling, visualization, and evaluation for understanding large, complex networks. Specific applications include semantic data analysis, big data visualization, data sets for graph analytics research, activity-based analytics, performance analysis of big graph data tools and anti-evasive anomaly detection.
- Exploratory data analysis provides mechanisms to explore and analyze massive streaming data sources to gain new insights and inform decisions. Applications include exploratory graph analysis, geo-inspired parallel simulation and cyber analytic data sets.
HPDA in action: case studies
Let’s consider a couple of real-life examples of HPDA in action. These examples show how companies are capitalizing on the convergence of technologies for HPC and big data.
To help fight cancer and other diseases, TGen needed extremely scalable, reliable and available HPC nodes to develop personalized medical treatments. To meet this need, TGen optimized its infrastructure, scaling its existing Dell EMC HPC cluster with Dell EMC™ PowerEdge™ blades. The system incorporates powerful big data and analytics tools, leveraging a Dell EMC Hadoop platform and Statistica software. The increased performance helps TGen accelerate results, enabling researchers to expand treatments to a larger number of patients. Watch the video.
Another Dell EMC customer, Sensus, needed to increase its data set size to be able to more easily visualize meter sensor performance problems. To meet this need, the company implemented a data cluster and a data lake—based on a Hadoop platform and technologies from Dell EMC and Intel—that consolidates manufacturing, testing and other data streams. With this consolidated platform, Sensus can quickly analyze data from 17 million gas, electric and water meter sensors, and proactively identify device problems, helping to predict and prevent future device failures. Read the case study.
Enabling proactive maintenance with HPDA
On the IoT front, HPDA technologies are enabling predictive maintenance of assets to help prevent equipment failures, extend machine life and help organizations gain a better return on their assets. These technologies go beyond condition monitoring to enable condition understanding. On its own, condition monitoring provides time to act, but when data is dynamically provided to a device-specific predictive model you can achieve condition understanding. That means your users will have time to act on maintenance events and have a clear understanding of the actions they need to take.
For organizations new to IoT, the challenges are numerous, spanning both hardware and software. For example, they need to:
- Determine what is an optimal sensor network architecture and the best location for edge nodes
- Determine what data is needed for early analysis, what data can be discarded and what data is needed for deep analysis at the data centers
- Identify the software stack that is needed on the edge nodes that enables the data analysis and filtering
- Manage the entire end-to-end process, keeping in mind time-to-insight (what good is the data if the failure has already happened?)
- Move forward with a deployment that includes industry best practices for data movement, data security and regulatory compliance
- And, last but not least, keep costs manageable
And this is where expertise comes in extremely handy. With that thought in mind, Dell EMC has joined forces with Software AG and Kepware to produce an end-to-end solution for proactive maintenance. It offers the complete hardware-software stack that easily allows for the management of IoT sensors, the data produced, and the analysis of that data in real time—ultimately easing the deployment of a comprehensive IoT based solution for infrastructure maintenance.
That’s just one of countless advances made possible by the rise of technologies and solutions for high performance data analytics. For a look at more of these technologies and solutions, visit Dell.com/HPC.
Adnan Khaleel is a Global Sales Strategist for Dell EMC.
 HPCwire, IDC 2015. “The Changing Face of HPC.”