By Chris Latimer, vice president, product management, DataStax\n\nThere\u2019s a lot of talk about the importance of streaming data and event-driven architectures right now. You might have heard of it, but do you really know why it\u2019s so important to a lot of enterprises? Streaming technologies unlock the ability to capture insights and take instant action on data that\u2019s flowing into your organization; they\u2019re a critical building block for developing applications that can respond in real-time to user actions, security threats, or other events. In other words, they\u2019re a key part of building great customer experiences and driving revenue.\n\nHere\u2019s a quick breakdown of what streaming technologies do, and why they\u2019re so important to enterprises.\n\nData in motion\n\nOrganizations have gotten pretty good at creating a relatively complete view of so-called \u201cdata at rest\u201d -- the kind of information that\u2019s often captured in databases, data warehouses, and even data lakes to be used immediately (in \u201creal time\u201d) or to fuel applications and analysis later.\n\nIncreasingly, data that\u2019s driven by activities, actions, and events that happen in real-time across an organization pours in from mobile devices, retail systems, sensor networks, and telecommunications call-routing systems.\n\nWhile this \u201cdata in motion\u201d might ultimately get captured in a database or other store, it\u2019s extremely valuable while it\u2019s still on the move. For a bank, data in motion might enable it to detect fraud in real time and act upon it instantly. Retailers can make product recommendations based on a consumer\u2019s searching or purchasing history, the instant someone visits a web page or clicks on a particular item.\n\nConsider Overstock, a U.S. online retailer. It must consistently deliver engaging customer experiences and derive revenue from in-the-moment monetization opportunities. In other words, Overstock sought the ability to make lightning-fast decisions based on data that was arriving in real-time (generally, brands have 20 seconds to connect with customers before they move on to another website).\n\n\u201cIt\u2019s like a self-driving car,\u201d says Thor Sigurjonsson, Overstock\u2019s head of data engineering. \u201cIf you wait for feedback, you\u2019re going to drive off the road.\u201d\n\nThe event-driven architecture\n\nTo maximize the value of their data as it\u2019s created \u2014 instead of waiting hours, days, or even longer to analyze it once it\u2019s at rest\u2014Overstock needed a streaming and messaging platform, which would enable them employ real-time decision-making to deliver personalized experiences and recommend products likely to be well-received by customers at the perfect time (really fast, in other words).\n\nData messaging and streaming is a key part of an event-driven architecture, which is a software architecture or programming approach built around the capture, communication, processing, and persistence of events\u2014mouse clicks, sensor outputs, and the like.\n\nProcessing streams of data involves taking actions on a series of data that originates from a system that continuously creates \u201cevents.\u201d The ability to query this non-stop stream and find anomalies, recognize that something important has happened, and act on it quickly and in a meaningful way, is what streaming technology enables.\n\nThis is in contrast to batch processing, where an application would store a data after intaking it, process it, and then store the processed result or forward it to another application or tool. Processing might not start until after, say, 1000 data points have been collected. That\u2019s too slow for the kind of applications that require reactive engagement at the point of interaction.\n\nIt\u2019s worth pausing to break that idea down: \n\nWhen message queue isn\u2019t enough\n\nSome enterprises have recognized that they need to derive value from their data-in-motion and have assembled their own event-driven architectures from a variety of technologies, including message-oriented middleware systems like Java messaging service (JMS) or message queue (MQ) platforms.\n\nBut these platforms were built on a fundamental premise that the data they processed was transient and should be immediately discarded once each message had been delivered. This essentially throws away a highly valuable asset: data that\u2019s identifiable as arriving at a particular point in time. Time-series information is critical for applications that involve asynchronous analysis, like machine learning. Data scientists can\u2019t build machine learning models without it. A modern streaming system needs to not only pass events along from one service to another, but also store them in a way that retains their value or usage later.\n\nThe system also needs to be able to scale to manage terabytes of data and millions of messages per second. The old MQ systems are not designed to do either of these.\n\nPulsar and Kafka: The old guard and the unified, next-gen challenger\n\nAs I touched upon above, there are a lot of choices available when it comes to messaging and streaming technology.\n\nThey include various open-source projects like RabbitMQ, ActiveMQ, and NATS, along with proprietary solutions such as IBM MQ or Red Hat AMQ. Then there are the two well-known, unified platforms for handling real-time data: Apache Kafka, a very popular technology that has become almost synonymous with streaming; and Apache Pulsar, a newer streaming and message queuing platform.\n\nBoth of these technologies were designed to handle the high throughput and scalability that many data-driven applications require.\n\nKafka was developed by LinkedIn to facilitate data communication between different services at the job networking company and became an open source project in 2011. Over the years it\u2019s become a standard for many enterprises looking for ways to derive value from real-time data.\n\nPulsar was developed by Yahoo! to solve messaging and data problems faced by applications like Yahoo! Mail; it became a top-level open source project in 2018. While still catching up to Kafka in popularity, it has more features and functionality. And it carries a very important distinction: MQ solutions are solely messaging platforms, and Kafka only handles an organization's streaming needs\u2014Pulsar handles both of these needs for an organization, making it the only unified platform available.\n\nPulsar can handle real-time, high-rate use cases like Kafka, but it\u2019s also a more complete, durable, and reliable solution when compared to the older platform. To have streaming and queuing (an asynchronous communications protocol that enables applications to talk to one another), for example, a Kafka user would need to bolt on something like RabbitMQ or other solutions. Pulsar, on the other hand, can handle many of the use cases of a traditional queuing system without add-ons.\n\nPulsar carries other advantages over Kafka, including higher throughput, better scalability, and geo-replication, which is particularly important when a data center or cloud region fails. Geo-replication enables an application to publish events to another data center without interruption, preventing the app from going down\u2014and preventing an outage from affecting end users. (Here\u2019s a more technical comparison of Kafka and Pulsar).\n\nWrapping up\n\nIn the case of Overstock, Pulsar was chosen as the retailer's streaming platform. With it, the company built what its head of engineering Sigurjonsson describes as an \u201cintegrated layer of data and connected processes governed by a metadata layer supporting deployment and utilization of integrated reusable data across all environments.\u201d\n\nIn other words, Overstock now has a way to understand and act upon real-time data organization-wide, enabling the company to impress its customers with magically fast, relevant offers and personalized experiences.\n\nAs a result, teams can reliably transform data in flight in a way that is easy to use and requires less data engineering. This makes it that much easier to delight their customers\u2014and ultimately drive more revenue.\n\nTo learn more about DataStax, visit us here.\n\n\n\nAbout Chris Latimer:\n\nChris is a technology executive whose career spans over twenty years in a variety of roles including enterprise architecture, technical presales, and product management. He is currently Vice President of Product Management at DataStax where he is focused on building the company's product strategy around cloud messaging and event streaming. Prior to joining DataStax, Chris was a senior product manager at Google where he focused on APIs and API Management in Google Cloud. Chris is based near Boulder, CO, and when not working, he is an avid skier and musician and enjoys the never-ending variety of outdoor activities that Colorado has to offer with his family.