Unstructured digital data is finally seeing its day in the sun. Relational databases and other forms of formatted and categorized data are far from obsolete, but their long reign at the pinnacle of data value is slipping.
The critical measure in this regard is “value.” Unstructured data includes everything from documents to images to video and audio streams to social media posts. Collectively, these types of data account for 80-90% or more of the overall digital data universe, by most estimates.
Clearly, the problem associated with unstructured data has never been its rarity. Rather, it’s been the lack of tools and technologies able to extract business value from this diverse and disordered digital resource. If anything, in fact, the daunting volumes of unstructured data have actually discouraged companies from even attempting to mine it for nuggets of useful information.
Companies intimidated by data volume better hold on tight. During 2018, storage suppliers added more than 700 exabytes of storage capacity to the worldwide installed base of all storage media types, according to IDC. From 2018 to 2023, the worldwide installed base of storage capacity will more than double, reaching 11.7 zettabytes in 2023, IDC predicts.
That’s a lot of data. Fortunately, a variety of artificial intelligence (AI) technologies are arriving just in time to help companies exploit this underutilized digital resource. For example:
- Natural language processing can be used to extract the meaning of business documents, emails, journal articles, and social media posts
- Pattern recognition algorithms can be used to identify people, animals, or other objects in catalogs of digital images
- Speech-to-text conversion can be used to turn audio speech into searchable text
“Training” these and other AI-powered systems is typically very data intensive – the more data and faster access to the data available for training, the more accurate and reliable their outputs once in production. In short, there is no cold data in the world of AI. Unfortunately, the volumes and speeds required to feed these systems can easily outpace the capabilities of legacy storage systems and media.
To realize the full benefit of the unstructured data/AI marriage, organizations need to move away from data silos and even from the data lake storage model. What’s needed is a highly scalable, massively parallel data hub approach that incorporates a cloud-like architecture as well as all-flash storage. Only with this type of hub, designed not to simply store data but to share it, will organizations finally be able to turn their unstructured data from buried ore into business gold.
For more information about how Pure Storage and its partners can help your organization build a data infrastructure that powers the AI-based processing of unstructured data, visit purestorage.com/evolution.