In data we trust – or do we?

AI is the engine that drives insights. Data fuels the engine. So what happens if the fuel is diluted?

artificial intelligence robotics machine learning
Thinkstock

Last year, researchers at the University of Warwick (UK) found that some rideshare drivers organized simultaneous sign-offs to cause a shortage of drivers. This triggered a surge in prices which meant a bigger payoff for drivers once they signed back on. The drivers knew that they were participating in a system managed by algorithms, so they made it work in their favor.

This is an example of how businesses have been developing in a way that leaves them open to a new kind of vulnerability: inaccurate, manipulated and biased data that leads to corrupted business insights and wrong decisions.

In the case of the rideshare drivers, the data was fiddled with on purpose, and the negative impact was on the company and its customers. But with the influx of AI-powered decision-making in all areas of business and society, data that is accidentally skewed can produce negative consequences in virtually all areas of life, for example on elections.

Thus, ensuring data veracity has to rule supreme – both from a purely commercial perspective and as part of an organization’s Responsible AI efforts (which I have discussed in a previous post).

There is no doubt that that a data-driven approach offers companies great benefits. At Accenture we found that 82 percent of organizations are already using data to drive critical and automated decision-making. But it inevitably comes with risk. Even the most advanced AI is only as good as the data that goes into it.

This means businesses not only need to spend heavily to determine what they can get out of data-driven insights and technologies, but they also need to invest in what’s going into them.

The data veracity challenge

Accumulating and preparing such information for use is one of the biggest challenges for organizations that deploy AI systems. Over half of companies (54 percent) currently face this issue, reporting that they are only somewhat confident in the quality of the results they receive from their systems.

Take United Airlines as an example. The airline recently learned that the inaccurate data it was using was contributing to $1 billion a year in missed revenue. United was relying on seating demand forecasts based on outdated assumptions about consumers’ flying habits, which resulted in inaccurate pricing models. This, along with other data-driven inaccuracies, are becoming increasingly important contributors to operational performance.

But the situation is not irreparable. By addressing these risks today, United, like other companies, can work to ensure the information they use can be trusted in the future.

To ensure data veracity, businesses need to address three areas:

  1. Provenance: verifying the history of data from its origin throughout its life cycle
  2. Context: considering the circumstances around its use
  3. Integrity: securing and maintaining data

Creating a “data veracity steward”

How can companies do this? By establishing a new “data intelligence” practice that has the sole task of grading the truth within data. This practice does not have to start from scratch, as it can draw from existing data science and cybersecurity capabilities.

But this will only get businesses part of the way. To both maximize accuracy and minimize incentives for data manipulation, the data intelligence practice needs to be responsible for understanding the behavior around the data, and the context in which it is being analyzed.

  • Behavior – whether it’s an individual consumer creating a data trail by shopping online, or a sensor network reporting temperature readings for an industrial system, there’s an associated behavior around all data origination. Cutting-edge anomaly detection systems like MIT’s AI2 now identify abnormal patterns of behavior, then categorize them based on experience provided by human experts. AI2 detects 85 percent of cyber-attacks, and presents the most pressing incidents to experts for review.
  • Context – the data practice need the ability to consider given data within available context to flag anything that seems out of the ordinary. Take Thomson Reuters which has developed an algorithm that uses streams of real-time data from Twitter to help journalists classify, source, fact-check and debunk rumors faster than before.

Reward the truth

Companies can then begin to address issues that might be incentivizing deceit in the first place. It’s an uncomfortable and unfortunate realization, but if a business depends on data collection, they are potentially incentivizing data manipulation. Amazon responded to fake reviews that inflated third-party product and seller ratings by giving more weight to verified reviews from customers who had definitively purchased the item from Amazon. On top of that, Amazon established an invitation-only incentivized review program.

Of course, presence of bad data isn’t always the result of malicious intent. It may be a sign that a process isn’t working the way it was intended. To deploy AI responsibility, companies need to uncover the processes that unknowingly incentivize deceit and improve the truth in data across a system. Incentivizing truth will allow companies to reduce noise in data, so that real threats stand out. Ultimately, it will help ensure the data is trustworthy enough to drive critical decisions in the future.

By making the investment in ensure data accuracy, companies will generate more value from their AI systems, and build a strong foundation for the success of other digital transformation initiatives. A new data intelligence practice can pave the way.

This article is published as part of the IDG Contributor Network. Want to Join?

NEW! Download the Fall 2018 digital issue of CIO