Moving Toward Real-time Analytics

applegate blog photo final
Dell EMC

Moving Toward Real-time Analytics

In the years since the rise of big data and analytics (Apache™ Hadoop® and associated distributed data systems), we’ve seen a tremendous amount of innovation from teams looking to overcome either use-case specific, industry-specific, or even infrastructure-specific data pipeline challenges. At some point enough people begin innovating and creating similar designs, sharing their solutions at conferences or in online forums, and best practices form around implementing the same design patterns over and over.

One such challenge that folks have been designing around is the need to take in real-time streams of data and provide both real-time dashboards and analytics. These capabilities enable quick decisions as well as the ability to include important historical data to mine for insights and patterns. The Industrial Internet of Things (IIoT) characterized by voluminous data from sensors streaming at high velocities, has been one of the primary drivers for solutions where you need rapid analytics to react and to respond to sudden changes, but also need to include the historical data to perform data modeling and even reconcile billing.

One key problems that emerged early on was the lack of a tool that could do both of those use cases well. Hadoop was great at the historical dimension, with its cheap and deep philosophy. Newer technologies like Storm and Spark could deliver the rapid analysis of data necessary for the real-time streams, but had issues storing persistence to disk in an efficient and reliable manner.

To move forward, the industry began to collect around design patterns like the Lambda Architecture (and others). These would be where data would be ingested into a distributed messaging system and then be consumed both by separate real-time and persistent storage layers in parallel. This architecture gave users one system that was fast but approximate (real-time layer) and another system that was accurate but slow (persistent store layer). It was soon recognized that this two-part solution would require a tremendously complex query strategy that would enable the solution to knit together different answers from these two systems to then arrive at the right source of the truth.

Rest assured if something keeps popping up as a challenge, the talented and driven big data and analytics community will eventually launch a tool to solve it. Over the last year or two, we’ve started to see tools built specifically to handle this real-time analytics storage problem. Implementations have ranged from new open-source projects to revised storage layers inside commercial Hadoop offerings.

What we see time and time again is that companies that have invested in a top-to-bottom digital transformation strategy are able to quickly adopt and embrace these new programs with relative ease. Technologies like private cloud, containerization, and Platforms-as-a-Service lower the friction from an infrastructure standpoint. However, just as important is for organizations to be able to rapidly adopt and integrate new tools into work streams, implementing secure, purpose-built and simplified Ready Solutions for organizations to quickly adopt and build value. This isn’t just a “nice-to-have” but rather is a requirement.

It’s interesting that many customers have been most worried about being able to adopt solutions to handling “big data” moving at high velocity, but it turns out that building an IT culture and strategy that can handle the velocity of new design patterns and data pipelines is just as, or even more, critical for success.

Kris Applegate is a Solutions Architect in one of the 21 Dell EMC Customer Solutions Centers.