Advances in analytic technologies and business intelligence are allowing CIOs to go big, go fast, go deep, go cheap and go mobile with business data.
Current trends center as much on tackling analytics challenges as they do on taking advantage of opportunities for new business insights. For example, technologies for managing and analyzing large, diverse data sets are arriving just as many organizations are drowning in data and struggling to make sense of it. Still, many of the cost and performance trends in advanced analytics mean companies can ask more complicated questions than ever before and deliver more useful information to help run their businesses.
In interviews, CIOs consistently identified five IT trends that are having an impact on how they deliver analytics: the rise of Big Data, technologies for faster processing, declining costs for IT commodities, proliferating mobile devices and social media.
1. Big Data
Big Data refers to very large data sets, particularly those not neatly organized to fit into a traditional data warehouse. Web crawler data, social media feeds and server logs, as well as data from supply chain, industrial, environmental and surveillance sensors all make corporate data more complex than it used to be.
Although not every company needs techniques and technologies for handling large, unstructured data sets, Verisk Analytics CIO Perry Rotella thinks all CIOs should be looking at Big Data analytics tools. Verisk, which helps financial firms assess risk and works with insurance companies to identify fraud in claims data, had revenues of more than $1 billion in 2010.
Technology leaders should adopt the attitude that more data is better and embrace overwhelming quantities of it, says Rotella, whose business involves “looking for patterns and correlations between things that you don’t know up front.”
Big Data is an “explosive” trend, according to Cynthia Nustad, CIO of HMS, a firm that helps contain healthcare costs for Medicare and Medicaid programs, as well as private businesses. Its clients include health and human services programs in more than 40 states and more than 130 Medicaid managed care plans. HMS helped its clients recover $1.8 billion in costs in 2010 and save billions more by preventing erroneous payments. “We’re getting and tracking so much material, both structured and unstructured data, and you don’t always know what you’re looking for in it,” Nustad says.
One of the most talked about Big Data technologies is Hadoop, an open-source distributed data processing platform originally created for tasks such as compiling web search indexes. It’s one of several so-called “NoSQL” technologies (others include CouchDB and MongoDB) that have emerged to organize web-scale data in novel ways.
Hadoop is capable of processing petabytes of data by assigning subsets of that data to hundreds or thousands of servers, each of which reports back its results to be collated by a master job scheduler. Hadoop can either be used to prepare data for analysis or as an analytic tool in its own right. Organizations that don’t have thousands of spare servers to play with can also purchase on-demand access to Hadoop instances from cloud vendors such as Amazon.
Nustad says HMS is exploring the use of NoSQL technologies, although not for its massive Medicare and Medicaid claims databases. These contain structured data and can be handled with traditional data warehousing techniques, and it makes little sense to depart from traditional relational database management when tackling problems for which relational technology is the tried and true solution, she says. However, Nustad can see Hadoop playing a role in fraud and waste analytics, perhaps analyzing records of patient visits that might be reported in a variety of formats.
Among the CIOs interviewed for this story, those who had practical experience with Hadoop, including Rotella and Shopzilla CIO Jody Mulkey, are at companies that provide data services as part of their business.
“We’re using Hadoop for what we used to use the data warehouse for,” Mulkey says, and, more importantly, to pursue “really interesting analytics that we could never do before.” For example, as a comparison shopping site, Shopzilla accumulates terabytes of data every day. “Before, we would have to sample data and partition data-it was so much work just to deal with the volume of data,” he says. With Hadoop, Shopzilla is able to analyze the raw data and skip the in-between steps.
Good Samaritan Hospital, a community hospital in Southwest Indiana, is at the other end of the spectrum. “We don’t have what I would classify as Big Data,” says CIO Chuck Christian. Nevertheless, regulatory requirements are causing him to store whole new categories of data such as electronic medical records in great quantities. Doubtless there is great potential to glean healthcare quality information from the data, he says, but that will probably happen through regional or national healthcare associations rather than his individual hospital. It’s unlikely he’ll invest in exotic new technologies himself.
John Ternent, CIO at Island One Resorts, says that whether his analytic challenges are driven by Big Data “depends on how capital your B and D are.” But he’s seriously considering using Hadoop instances in the cloud as an economical way of running complex mortgage portfolio analytics for the company, which manages eight timeshare resort properties across Florida. “That’s a potential solution to a very real problem we have now,” he says.
2. Business Analytics Get Faster
Big Data technologies are one element of a larger trend toward faster analytics, says University of Kentucky CIO Vince Kellen. “What we really want is advanced analytics on a hell of a lot of data,” Kellen says. How much data one has is less critical than how efficiently it can be analyzed, “because you want it fast.”
The capacity of today’s computers to process much more data in memory allows for faster results than when searching through data on disk-even if you’re crunching only gigabytes of it.
Although databases have, for decades, improved performance with caching of frequently accessed data, now it’s become more practical to load entire large datasets into the memory of a server or cluster of servers, with disks used only as a backup. Because retrieving data from spinning magnetic disks is partly a mechanical process, it is orders of magnitude slower than processing in memory.
Rotella says he can now “run analytics in seconds that would take us overnight five years ago.” His firm does predictive analytics on large data sets, which often involves running a query, looking for patterns, and making adjustments before running the next query. Query execution time makes a big difference in how quickly an analysis progresses. “Before, the run times would take longer than the model building, but now it takes longer to build the model than to run it,” he says.
Columnular database servers, which invert the traditional row-and-column organization of relational databases, address another category of performance requirements. Instead of reading entire records and pulling out selected columns, a query can access only the columns of interest-dramatically improving performance for applications that group or measure a few key columns.
Ternent warns that the performance benefits of a columnar database come only with the right application and query design. “You have to ask it the right question the right way for it to make a difference,” he says. Meanwhile, he says, columnar databases only really make sense for applications that must handle over 500 gigabytes of data. “You have to get a certain scale of data before columnar makes sense because it relies on a certain level of repetition” to achieve efficiencies.”
To improve analytics performance, hardware matters, too. Allan Hackney, CIO at the insurance and financial services giant John Hancock, is adding GPU chips-the same graphical processors found in gaming systems-to his arsenal. “The math that goes into visualizations is very similar to the math that goes into statistical analysis,” he says, and graphics processors can perform calculations hundreds of times faster than conventional PC and server processors. “Our analytic people love this stuff.”
3. Technology Costs Less
Along with increases in computing capacity, analytics are benefitting from falling prices for memory and storage, along with open source software that provides an alternative to commercial products and puts competitive pressure on pricing.
Ternent is an open-source evangelist. Prior to joining Island One, he was vice president of engineering for Pentaho, an open-source business intelligence company, and worked as a consultant focusing on BI and open source. “To me, open source levels the playing field,” he says, because a mid-sized company such as Island One can use R, an open-source application, instead of SAS for statistical analysis.
Once, open-source tools were available only for basic reporting, he says, but now they offer the most advanced predictive analytics. “There is now an open-source player across just about the entire continuum, which means there’s tooling available to whoever has the gumption to go and get it.”
HMS’ Nustad sees the changing economics of computing altering some basic architectural choices. For example, one of the traditional reasons for building data warehouses was to bring the data together on servers with the computing horsepower to process it. When computing power was scarcer than it is today, it was important to offload analytic workloads from operational systems to avoid degrading the performance of everyday workloads. Now, that’s not always the right choice, Nustad says.
“With hardware and storage so cheap today, you can afford to juice up those operational systems to handle a BI layer,” she says. By factoring out all the steps of moving, reformatting and loading data into the warehouse, analytics built directly on an operational application can often provide more immediate answers.
Hackney observes, however, that although the price performance trends are helpful for managing costs, potential savings are often erased by increased demands for capacity. “It’s like running in place,” he says. While John Hancock’s per unit cost for storage dropped by 2 to 3 percent this year, consumption was up 20 percent.
4. Everyone’s Mobile
Like nearly every other application,
BI is going mobile. For Nustad, mobile BI is a priority “because everybody wants Nustad herself wants access to reports on whether her organization is meeting its service level agreements “served up on my iPad when I’m very mobile and not at my desk.” She also wants to deliver mobile access to data for her firm’s customers, to help them monitor and manage healthcare expenses. It’s “a customer delight feature that was not demanded five years ago, but is demanded today,” she says.
For CIOs, addressing this trend has more to do with creating user interfaces for smartphones, tablets and touch screens than it is about sophisticated analytic capabilities. Maybe for that reason, Kellen dismisses it as fairly easy to address. “To me, that’s kind of trivial,” he says.
Rotella doesn’t think it’s that simple. “Mobile computing affects everyone,” he says. “The number of people doing work off of iPads and other mobile devices is exploding. That trend will accelerate and change how we interact with our computing resources in an enterprise.” For example, Verisk has developed products to give claims adjusters access to analytics in the field, so they can run replacement cost estimates. That’s a way to “leverage our analytics and put it at the fingertips of the people that need it,” he says.
What makes this challenging is how much more quickly technology changes, Rotella says. “Two years ago, we didn’t have iPads. Now everyone is running around with iPads.” With multiple device operating systems in play, “we’re trying to understand how to best leverage our development so we’re not writing these things three, four, five times over,” he says.
On the other hand, the requirement to create native applications for each mobile platform may be fading now that the browsers in phones and tablets are more capable, says Island One’s Ternent. “I’m not sure I’d invest in a customized mobile device application if I can just skin a web-based application for a mobile device.”
With the explosion of Facebook, Twitter and other social media, more companies want to analyze the data these sites generate. New analytics applications have emerged to support statistical techniques such as natural language processing, sentiment analysis, and network analysis that aren’t part of the typical BI toolkit.
Because they’re new, many social media analytics tools are available as services. One prominent example is Radian6, a software-as-a-service product recently purchased by Salesforce.com. Radian6 presents a dashboard of brand mentions-tagged positive, negative, or neutral-based on Twitter feeds, public Facebook posts, posts and comments on blogs and discussion board conversations. When purchased by the marketing and customer service departments who use them, such tools may not require heavy IT involvement. Still, University of Kentucky’s Kellen believes he needs to pay attention to them. “My job is to identify these technologies, see what the match is for the organization in terms of competitiveness, and start educating the right people,” he says.
The university has the same interest in monitoring sentiment about its brand as any other business, but Kellen says he may also identify opportunities to develop applications specific to school concerns such as student retention. For example, monitoring student posts on social media could help faculty and administrators learn earlier when students are having academic trouble, much as Dell does when its support organization detects people tweeting about broken laptops, Kellen says. IT developers should also be looking for ways to build alerts generated by social media analytics into applications for responding to those events, he says.
“We don’t have the know-how, nor the tools, go out and mine massive quantities of social media postings,” says Hackney. “But once you have the data, you need to be able to have enough information about events happening in the company to be able to correlate them.” While John Hancock’s efforts in this area are “nascent,” according to Hackney, he envisions a role for IT in correlating the data provided by a social analytics service with corporate data. For example, if the social media data shows comments about the company in the Midwest are becoming more negative, he would want to see if the company has made price or policy changes in that region that might explain the trend.
Finding such correlations could make a big difference in getting company leaders to believe in the return on investment of social media, Hackney says. “In my industry, everybody’s an actuary, everyone’s looking for the numbers-they don’t take anything on belief.”
David F. Carr is a freelance writer based in Florida.