by Thor Olavsrud

Splunk Targets Hadoop (and Unstructured Data) With Hunk

Jun 26, 20133 mins
AnalyticsBig Data

The specialist in collecting, monitoring and analyzing big data generated by machines is now extending its capabilities to the universe of unstructured text data with Hunk, a tool that works in Hadoop.

If you deal with machine data, chances are you’re familiar with Splunk, a big data tool geared specifically for machine data. Splunk collects, indexes and correlates real-time data generated by websites, applications, servers, networks, mobile devices, sensors and RFID assets. The idea is to generate insight and “operational intelligence” from customer clickstreams, transactions, network activity, call records and so forth.

But with its focus on machine data, Splunk has only been able to handle a subset of the massive volumes of unstructured data inside most organizations. Now Splunk is working to extend its capabilities to raw data in Hadoop with a new tool it calls Hunk.

“Our customers love how Splunk software enables them to easily visualize and analyze data, and they asked us if we could help them do the same on the sizeable low-cost data stores they’ve built up in Hadoop,” says Guido Schroeder, senior vice president of products at Splunk.

“To create it, we extended our technology with a new patent-pending virtual index technology. Hadoop is a tremendous technology full of potential—if you can get to the data and act on it, Schroeder says. We developed Hunk as a standalone software product to help organizations give broader user groups insight into their data assets without custom development, costly data modeling or lengthy batch processing iterations. By providing interactive data exploration, discovery and analytics, Hunk empowers users to derive actionable insights from this raw data in Hadoop.”

Hunk, now in private beta, boasts a number of features to aid in the exploration and analysis of data within Hadoop, including the following:

  • Splunk Virtual Index. This patent-pending technology enables the seamless use of the entire Splunk technology stack, including the Splunk Search Processing Language (SPL), for interactive exploration, analysis and visualization of data as if it was stored in a Splunk software index.
  • Point and go. Splunk says it designed Hunk for interactive data exploration across large, diverse data sets. There is no need to “understand” data upfront. You simply have to point Hunk at the Hadoop cluster and can start exploring data immediately.
  • Interactive analysis. Users can use Hunk to drive deep analysis, detect patterns and find anomalies across terabytes and petabytes of data. Users can also correlate data to spot trends, identify patterns and even enrich insights further by connected data from external relational databases using Splunk DB Connect.
  • Report on and visualize data. Users can build advanced graphs and charts on the fly to visualize and contextualize data.
  • Create custom dashboards. Users can combine multiple charts, views and reports into role-specific dashboards that can be viewed and edited on computers, tablets or mobile devices.

Hunk joins Splunk’s other Hadoop-focused offerings, including Splunk Hadoop Connect, which provides bi-directional integration with Splunk Enterprise, and Splunk App for HadoopOps, which monitors the entire Hadoop deployment from Splunk Enterprise.

Splunk is working with leading Hadoop distributions, including Cloudera, Hortonworks and MapR to certify Hunk with their distributions. The company says it is aiming to make Hunk generally available by the end of the year.

Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for Follow Thor on Twitter @ThorOlavsrud. Follow everything from on Twitter @CIOonline, Facebook, Google + and LinkedIn. Email Thor at