by Mike Lynch

Traditional databases have no hope of understanding the unstructured data in enterprises

Nov 26, 20114 mins
IT StrategyMobile AppsTelecommunications Industry

In these columns, I have written about everything from David Bowie’s caveman lyrics to Leo di Caprio entering people’s dreams in Inception, but I have always avoided talking shop. Well, this month, I’m breaking the rules. I want to talk to you about one of biggest shifts in the history of our industry – the information wars. Unstructured, or human-friendly, information has gone prime time and the pushy upstart is out to upset the old order. Unstructured data is at the centre of everything we do, after all – emails, text, video, phone calls, tweets and so on. Those who have carried around the database hammers for the last 20 years get very upset when you point out that everything in the world isn’t a nail. Indeed, the whole industry as we know it is has been dependant on the rows and columns of the good old database. Thanks to a computer’s power to use the location of a piece of information and implement an action based on this knowledge, many tasks can be automated. But knowing that A=B doesn’t tell you what A or B is. Nor is it about storing objects: a filing cabinet is a nice thing to have to put your DVDs in, but you’ll still need a DVD player to watch them. For example, you could store whole call centre recordings into a database if you wanted to, but you’d have no idea whether the customer was happy or not. And this is what businesses want to know after all: What are my customers saying? What opportunities can be seized? What risks should I avoid? To understand that data, you’d have to take a new approach. Realising the importance of unstructured information in the world, players in the structured world have been sitting up and listening. But I am surprised to hear them talk about putting unstructured data back into a database, which seems to miss the point. What modern enterprise would honestly consider taking all the emails they have on their email server and storing them in a database – meaning twice the storage, and the need to constantly sync and update with new traffic. The manage-in-place paradigm has been purposefully missed out here, but then these players have legacy businesses to defend. But such attempts at a defence will soon wear thin. No one denies the importance of unstructured information in the enterprise – after all, it accounts for 85 per cent of everything we interact with including emails, texts, videos and webpages. Being able to manage that information where it is and process it so you know what it means is essential. So for the first time it’s the ‘I’ in ‘IT’ that’s changing, not the ‘T’. But can you imagine any other industry that ignores 85 per cent of the problem, especially when it’s also the fastest growing bit. The next-age information platforms make sense of this information and use it to solve problems. And it’s nothing to do with SQL, Hadoop, metadata or in-memory database: object-orientated databases offer no more than storage for blobs of information – they don’t understand that information; Hadoop takes a document, breaks out all the words and puts them in a database, losing all the context. At best, this gives you the algorithmic approach the unstructured data community was using in the early eighties, and fails to process rich media. Metadata is also a non-starter – who creates and maintains it? And then you have in-memory databases, which give you faster processing but don’t understand meaning in any more depth and so they don’t give you new answers. The next generation of information management is about getting the machine to fit to humans and not the other way around. And so the real ‘big data’ agenda sees the structured and unstructured worlds combine to give one comprehensive information processing platform. The solution is in having a layer that goes right across an enterprise, one that is capable of processing both structured and unstructured data. This is accessed by a query language that includes SQL but not only this, and so also includes all the other functions that meaning-based technology affords, such as conceptual clustering of ideas and implicit search, language detection, speaker ID, and so on. There is no need to move the information about either, and put it into a database. It stays in-place. So next time someone tries to sell you a hammer, remember that they’ve only got 15 per cent of your problem sussed, and using it to bang in screws doesn’t work. Even they, secretly, realise that.

Watson’s AI is still elementary