How to Get More Bang from Your Big Data Clusters

In a collaborative research effort, Intel® and Dell EMC shed light on opportunities to use Intel® CoFluent™ technology to optimize the design of big data clusters.

Dell EMC

When it comes to the importance of data to an enterprise, the metaphors abound.  Data is the lifeblood of an organization. Data is the new gold. Data is the new oil. Data is the fuel for your analytics engine.

All of these characterizations underscore the fact that the competitiveness of an organization now depends on its ability to take advantage of the data it captures from business systems, the Internet of Things, social media and other channels. From optimizing manufacturing processes to building 360-degree views of customers, everything now comes down to the analysis of data.

In this digitally driven world, organizations need to do all they can to optimize their big data clusters to accelerate the analysis of massive amounts of data. And this is where a recent collaborative research project by Intel and Dell comes into the picture.

In this project, the research team focused on the use of modeling and simulation to identify optimal parameter values for big data clusters. This collaboration, which took place via the Dell Innovation Center at the University of Pisa, leveraged Intel® CoFluent™ technology for big data.

Intel CoFluent technology is a planning and optimization solution that predicts cluster performance and network behavior for big data challenges. This technology helps organizations address common big data cluster design issues, such as predicting system scalability, sizing the system, determining maximum hardware utilization, minimizing costs and predicting system performance.

Using an industry-standard benchmark, the research team compared the optimized parameter values suggested by Intel CoFluent technology for big data to settings chosen by big-data experts. The results showed that Intel CoFluent delivered a 32 percent gain in the benchmark performance score over the parameter choices of expert human developers. In a report on their research, the researchers noted that a 32-percent improvement is equivalent to the performance gain typically seen from a new processor generation.

The team also found that workload performance is CPU-sensitive and sensitive to scaling the number of nodes in a cluster. A scaled analysis of benchmark workloads showed that upgrading the processor led to an average decrease in execution time of 20 percent. The analysis also showed that queries with a large input size can benefit from having more nodes in the design. For example, scaling by a factor of 2 resulted in an average 33 percent decrease in execution time for those types of workloads.

In a higher-level finding, the researchers noted that accurate simulation of workloads provides a development tool for choosing better values for configuration parameters, which in turn helps developers optimize big data clusters. They noted that his tool can help both expert and less experienced developers save development time, improve capacity planning and accurately tune big data clusters for business needs.

Those, of course, are among the keys to capitalizing more fully on data — the essence of an organization in today’s digitally driven world.

To learn more, download team’s white paper, “Optimize configuration parameters faster and more accurately, and speed up analyses of scaling big-data clusters.” In addition, you can explore the broad capabilities of Intel CoFluent technology at cofluent.intel.com.

Copyright © 2018 IDG Communications, Inc.