Data is the lifeblood of the modern business, and the digital transformation of enterprises, factories, automobiles and just about every possible consumer experience is creating a staggering amount of it. IDC predicts that global data will grow from 40 zettabytes (ZB) in 2019 to 175 ZB by 2025, which is many, many times more bytes of information than there are stars in the observable universe.
But even though CEOs and CIOs are well aware of how valuable all this information could be, machine learning (ML) is waiting for contextual understanding of business or technology improvement – ROI by return on decision making. That’s because, while the collective amount of global data is gargantuan, most of it is trapped in proprietary corporate silos. This inaccessible data is limiting the ability of machine learning and artificial intelligence (AI) to reach their full potential because they require large, diverse data sets to produce powerful insights and next-generation capabilities.
Realizing the Full Potential of AI and ML
Three use cases alone were expected to account for 25% of total AI spending for 2019, according to IDC: automated customer service agents, automated threat intelligence and prevention systems, and sales process recommendation and automation. But use case expansion is on the horizon. MMC Ventures reports that one in ten enterprises now use 10 or more AI applications; chatbots, process optimization and fraud analysis lead a recent survey’s top use cases. Prevalent applications include consumer/market segmentation (15%), computer-assisted diagnostics (14%), call center virtual assistants (12%), sentiment analysis/opinion mining (12%), face detection/recognition (11%), and HR applications (10%).
In the medical field, AI could regularly assist human doctors to accelerate diagnoses and improve their accuracy. In pharmaceuticals, AI and ML could predict promising lines of inquiry for drug discovery, leading to new cures and treatments. And in cities, AI could power fully autonomous vehicles, with traffic patterns that safely and seamlessly change on the fly in response to emergencies.
These possibilities can only be realized if organizations can exchange data with one another. We’re already seeing some efforts within specific industries to share information. For example, about a dozen large construction contractors have formed a council to share data with the goal of collectively improving the ability of AI to assess safety risk and predict the likelihood of accidents on jobsites. The Predictive Analytics Strategic Council was named “Innovator of the Year” in December 2019 for its achievements. The group is building on the results of a study showing that its AI system learned from data to predict roughly one in five safety incidents with 81% accuracy.
But to scale data sharing, we need to establish data ecosystems, in which enterprises, partners and consumers can easily and securely buy and sell data on a variety of data exchanges.
Requirements for a Successful Data Exchange
To succeed, these exchanges will need to fulfill a number of requirements. Specifically, the exchanges need to be:
- Trusted: Organizations and individuals need to know that they’re dealing with trusted, verified actors, and not scam artists selling fake or stolen data. Distributed ledger technologies hold a lot of promise to provide a means for consumers to give consent for their data to be used and also form the basis of Decentralized Identifiers (DIDs) that enable self-sovereign identity. As a result, a system can verify credentials with zero-knowledge proofs.
- Compliant: IT leaders must assure that the data traded over exchanges is compliant with regulations such as the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). AI and ML will likely be required, given the massive size of the data sets involved.
- Secure and sharable: Data needs to be encrypted both in transit and at rest, with a key sharing mechanism that enables the easy, secure transfer of data between two parties. An important aspect of secure data sharing is increasingly going to be the ability to run data analysis and machine learning algorithms on confidential and highly sensitive data without losing control of the data itself, or the algorithms. This will help to further grow the market and applications for data ecosystems. Important advances in this area include multi-party computations, confidential computing that leverages secure enclaves, and proxy re-encryption.
- Usable: Raw data will certainly be valuable. But just as oil is worth more after it’s been refined into gasoline, data is more valuable when it’s already in a readily useable format. The less ETL (extraction, transformation and loading) an organization has to do before analysis, the faster it can start generating insights.
Once these exchanges are pervasive, and vertical markets have developed their own robust data ecosystems, we’ll see AI and ML power capabilities that were previously thought of as existing only in science fiction. But just as important, data ecosystems will support the emergence of entirely new industries, business models, and revenue streams, ushering in the next generation of the information age.
For more information on creating data ecosystems, visit dxc.technology.