To gain the full value of data, you need to analyze it in real time
Best Practices from Data-Driven Organizations
By Armando Acosta, CIO
As their mastery of big data evolves, enterprises are getting better at using the Hadoop platform for batch processing in analytics applications. This capability is helping business leaders gain rear-view insights into business trends, which is good, however, not good enough. For customers to know and understand what happened one week ago is a must, however, the business now needs to know what happened 1 hour ago, one minute ago, and often times within a split second in order to gain the necessary insights to ensure their competitive advantage.
To compete effectively in a world that is increasingly driven by big data, enterprises now need to get better at using data in real-time analytics applications to understand what is happening now and what is likely to happen in the days and weeks ahead. Customers need to utilize a core set of big data tools to be able to stream, ingest, process, transform, and analyze data at real-time speed. This evolutionary shift requires advanced tools and technologies that enable organizations to process and analyze data as it flows into the enterprise.
For enterprises that are on this path to real-time data analytics, there is good news on the technology front in the form of various news technology stacks. My colleague Kris Applegate covered some of these stacks in a recent blog post. In this conversation, I will zero in on one of these new stacks: SMACK, an acronym based on its five components.
This big data and analytics toolchain brings together key open source technologies that work together to accelerate the data pipeline—from processing to analysis:
Spark for large-scale data processing
Mesos for orchestrating cluster resource management
Akka for a toolkit for data-heavy applications
Cassandra for a storage engine
Kafka for event processing
Collectively, these technologies in the SMACK stack work together to leverage distributed low-latency tools to process data at high speeds.
The use cases for high-speed processing via the SMACK stack are all over the map—from fraud detection and recommendation engines to predictive analytics and supply chain optimization. For illustrative purposes, let’s look at a straightforward use case from the world of manufacturing and the Internet of Things (IoT).
In this use case, a manufacturer captures enormous amounts of data from devices on the manufacturing floor using an edge device. Light weight analytics can be run at the edge in order to filter and aggregate the data to the core data center for in-depth analysis. The manufacturer can put the SMACK stack to work to analyze this IoT data in real time to identify performance trends that indicate when devices are heading toward failure. With these immediate insights, the manufacturer can work proactively to address the equipment issues before they bring down the manufacturing line—at a potentially huge cost to the business.
The key takeaway here is the importance of fast processing of big data. In many cases, we lose business value if we tuck data away for analysis at a later point in time. To gain the full value of data, we increasingly need to analyze it in real time. And when it comes to this need, we require combinations of technologies like those in the SMACK stack.
At Dell, we are working actively to help organizations put these new technologies to work in open source Hadoop deployments. You can learn more at Dell.com/Hadoop.
Armando Acosta is the Hadoop planning and product manager at Dell.