Big data is certainly all the rage. The Wall Street Journal recently ran a piece on data scientists commanding up to $300,000 per year with very little experience. Clearly the era of embracing big data is here.
However, since the tools and best practices in this area are so novel, it's important to revisit our assumptions about what big data can do for us – and, perhaps more importantly, what it can't do. Here are three commonly held yet mistaken assumptions about what big data can do for you and your business.
Big Data Can't Predict the Future
Big data – and all of its analysis tools, commentary, science experiments and visualizations – can't tell you what will happen in the future. Why? The data you collect comes entirely from the past. We've yet to reach the point at which we can collect data points and values from the future.
We can analyze what happened in the past and try to draw trends between actions and decision points and their consequences, based on the data, and we might use that to guess that under similar circumstances, if a similar decision were made, similar outcomes would occur as a result. But we can't predict the future.
Many executives and organizations attempt to glean the future out of a mass of data. This is a bad idea, because the future is always changing. You know how financial advisers always use the line, "Past performance does not guarantee future results?" This maxim applies to big data as well.
Instead of trying to predict the future, use big data to optimize and enhance what's currently true. Look at something that's happening now and constructively improve upon the outcomes for that current event. Use the data to find the right questions to ask. Don't try to use big data as a crystal ball.
Big Data Can't Replace Your Values – or Your Company's
Big data is a poor substitute for values – those mores and standards by which you live your life and your company endeavors to operate. Your choices on substantive issues may be more crystallized, and it may be easier and clearer to sort out the advantages and disadvantages of various courses of action, but the data itself can't help you interpret how certain decisions stack up against the standards you set for yourself and for your company.
Data can paint all sorts of pictures, both in the numbers themselves and through the aid of visualization software. Your staff can create many projected scenarios about any given issue, but those results are simply that – a projection. Your job as an executive, and as a CIO making these sorts of tools and staff available within your business, is to actually reconcile that data against your company's values.
For instance, imagine you're a car manufacturer. Your big data sources and tools tell you that certain vehicle models have a flaw that may cost a few cents to repair on vehicles yet to be manufactured, but would cost significantly more to repair in vehicles that have already been purchased by customers and are in production use. The data, and thus your data scientists on staff, might recommend fixing the issue on cars still on the assembly line but not bothering to fix the cars already out there in the world, simply because the data might have shown the cost exceeded the likelihood of damages across the board.
(Note that this scenario may sound familiar to you if you have been following the General Motors ignition switch saga. However, this is only a hypothetical example, and further, there is no evidence big data played into the GM recall.)
Say your company has a value statement that quality is job 1 and safety is of paramount importance. Though the data suggests a recall isn't worth it, you make the call as an executive to start the recall. You're informed, but you're not controlled by big data.
Above all, it's vital to remember that sometimes the right answer appears to be the wrong one when viewed through a different lens. Make sure you use the right lens.
Big Data Can't Solve Non-Quantifiable Problems
Behold the old saying: When you're a hammer, everything looks like a nail. Once you begin having some success using big data to predict and solve business problems, there will inevitably be a temptation to "ask the data" every time you have an issue or an item about which a resolution is unclear.
[ Related: Why Healthcare Presents a Big Data Challenge ]
As mentioned before, data can present you with more and better choices and, perhaps, make clear what may happen with each of those choices. Sometimes, though, data is no good at all – and that's when it's used with individuals.
Why? It's nearly impossible to quantify an individual's behavior. People have their own sets of circumstances, their own little universes, their own reasons and contexts. It's impossible to apply math to a single individual. Rather, you have to look at a group of individuals, a cohort of subjects with similar characteristics. Only then can you observe the trends of behavior that apply to the whole group.
This actually isn't a big data problem. It's a statistical problem. The easiest example that comes to mind is credit scoring, which purports to break consumers into groups and analyze the repayment and borrowing history of the individuals in each group in the aggregate.
If someone has, say, a 720 credit score, what that score actually means is that their repayment history puts them into a like statistical group – X percent (depending on which particular credit score, and which variant of that credit score, you look at to determine the actual percentage) of the persons in that grouping of individuals (in other words, a percentage of borrowers that had a score in that range) went on to either become seriously delinquent or actually entered default.
A credit score makes no statement about the individual. He or she may default next month, or never actually default, or become seriously delinquent and then recover on some timeline the statistics have no clue about and, therefore, no ability to predict.
[ Case Study: Monsanto Bets Nearly $1 Billion on Big Data Analytics ]
Credit scores can't predict a single individual's behavior. A borrower with an 805 credit score might be ready to strategically default and never borrow another penny again, whereas a borrower with a 590 credit score might have a disputed bill with a doctor and no other debt. This phenomenon is why some financial institutions don't price loans based on risk. Instead, these institutions thoroughly underwrite a borrower, as was done prior to the debut of the first credit score, when he or she requests funds. An analysis of an individual's situation gives far greater indications of his or her ability and willingness to repay than does a score based on a huge amount of data.
People are tricky. Humans are unpredictable. Don't make the mistake of thinking data can predict their behavior. Big data and humans beings are a precarious mix.