When transitioning workloads to virtual environments, one of the big drawbacks for data center administrators can be a loss of visibility.
When a problem occurs, it can be difficult to get a handle on details like which users are affected and by how much as well as the causal links between the user layer, the application layer and the underlying infrastructure. This is often because the hypervisor abstracts the data about the underlying hardware.
"Monitoring the dynamic nature of virtualization with tools designed for single-technology silos creates a significant challenge for administrators," says Dave Bartoletti, senior analyst at Forrester Research. "There is a growing need for solutions that provide cross-tier visibility to effectively troubleshoot, monitor and analyze data across silos and deliver real-time business insights and operational intelligence."
Splunk—provider of an engine that collects, indexes and analyzes massive volumes of machine-generated data—thinks big data is the answer. Splunk customer CloudShare,—a San Mateo, Calif.-based provider of pre-production cloud for dev and test, demos and POCs— sees a constant stream of data from its network/gateways/firewalls, backend, virtual machines, applications, web servers, databases and storage.
CloudShare's infrastructure as a service (IaaS) platform is designed to grant each customer—including a large number of Fortune 500 firms like HP, SAP, Microsoft and IBM—its own private multi-VM networked environment, including compute resources, networking, IP and preinstalled OS. During peak hours, its system performs about 500 VM resume/suspend operations an hour. Its VMware performance data alone comes in at about 2 million events per hour.
Getting a handle on that data, let alone correlating and analyzing it, is a tricky proposition. In its early days, Elad Gotfrid, CloudShare's director of IT, says the company got by with traditional monitoring tools. But it soon outgrew them.
Scaling Out With Splunk
"In the beginning, we used a traditional monitoring tool, which was good for a small scale," Gotfrid says. "Once you start to grow up, you see the scale doesn't allow you to use a traditional monitoring system anymore. You need higher visibility."
Gotfrid explains that CloudShare went with a new offering from Splunk—then in beta—called Splunk App for VMware, specifically designed for the VMware virtual layer. Originally, CloudShare brought in Splunk to monitor the performance of its virtual machines. But once the company saw the possibilities, it spread to every area of the business. He notes that CloudShare uses Splunk to collect performance stats, logs and events from the virtualization layer and then correlate that information with network, storage, OS and application events. This allows IT to contextualize infrastructure data and track business metrics such as usage and resource costs per trial and business user.
Dashboards link operational data from both physical and virtual sources, providing vital information to network operations, customer support, marketing, sales and R&D. CloudShare even leverages it to fight fraud by using network device and firewall information to create attack signatures that trigger automatic blocks or trigger alerts to network operations.
"At CloudShare, we think of Splunk as our eyes and ears," Gotfrid says. "Splunk software enables us to understand and oversee every aspect of our operations. The key asset we achieve from Splunk software is the ability to correlate business data with performance metrics. Compiling data about our customers and understanding which resources are being utilized allows us to understand and plan our capacity based on clear trends we identify."
Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline and on Facebook. Email Thor at email@example.com