The commercial Internet has now been around for twenty some years and the overall experience hasn’t changed much from the days of “You’ve Got Mail.”
The Internet started out as a research tool between government, universities and corporations. With the advent of hyperlinks, the Internet has been transformed into a commercial vehicle for the sale of good and services.
The Internet of today as a research tool is pathetic and has taken on a bias of consumerism. Take this example: “show me all printers that use HP 950 Ink cartridges.” The expectation is to get a list of all printers that use the HP 950 ink cartridges. Instead you receive over 500,000 hits on Google, over a million on Bing, mostly with links to the sales of printer ink cartridges. Yes, you do get a list of printers, however, this list is neither extensive or exclusive to just the printers that use the 950 ink cartridges.
Will artificial intelligence (AI) make the Internet smarter? Probably someday, but don’t look for it in the foreseeable future. Why is this? Because very little knowledge is captured in a form that AI machines can directly ingest today. This is where machine learning comes in. Using the latest breakthrough in neural network design that provides machine learning capabilities. Machine learning methods require feeding large amounts of various kinds of data on features of a subject matter.
As an example, say you want to sell sweaters on the Internet and you want to use an AI machine to help increase your sales. The first thing you need to do is teach your AI machine about sweaters. You meticulously feed in all kinds of published features of sweaters from all of the various sources including fashion magazines, top retail product catalogs, bloggers, etc. All these sources feeding into your AI machine, learning about all the different sizes (bust, waist, hip) and styles that make up a sweater that include: pullover, v-neck, cardigan, turtlenecks, vests and full length skirts. Also, don’t forget the patterns, colors and materials such as wool, cashmere, cotton and synthetics. All of these features are known as “empirical based” features that are well documented across the Internet.
How about those features that could be considered inferred or “soft features” that are temporal or spatial in nature? Sweaters are garments that are related to seasons or climates that are mostly cold. Would you sell sweaters in the middle of June and July? Absolutely, what about all of the people that live in the southern hemisphere!
Given that you will spend a tremendous amount of time and effort developing sufficiently large and diverse data sets to train your AI machine, what level of correctness do we need in order to have a productive AI machine? Probably the data that is published on the Internet would be necessary and sufficient to answer 90 percent of the inquires on sweaters.
After all, by Google’s own admission, their open source Natural Language Understanding (NLU) system called SyntaxNet has just over a 90 percent accuracy rate. This is a great accomplishment for natural language understanding. However, what about your proprietary business processes? How much documentation do you have on processes that are running in your corporate systems? How complete is this documentation? Has it been kept up to date and reflect all the interdependences of your enterprise?
Could your business model handle just 90 percent accuracy? In the health insurance industry, depending on the size of your company, 90 percent accuracy could translate in tens to hundreds of millions to billions of dollars lost in improper claims adjudication.
Do you have large enough data sets that describe all of the various kinds of features that make up your business process in order to have an accuracy rate that would approach 100 percent? Unsupervised machine learning approaches might not provide you with the level of accuracy that will make them practical.
Different approaches for knowledge acquisition
So, what is your alternative? In my earlier post “Big data and machine learning – is the glass half empty?“, I articulated there are two different systems that make up cognitive computing. The first is statistical reasoning: this is where machine learning falls under. The second is logic reasoning by which you develop knowledge representation by creating an ontology. Back in the early 1990s Tom Gruber, who now heads up Apple Siri, defined an ontology as “an explicit specification of a conceptualization.”
To get the level of accuracy to support your business processes, you will need to look at what the banking industry is doing. In “Don’t do what I say, do what I mean,” I articulated how the banking industry is using ontologies to develop knowledge models to describe business concepts and features to mitigate global risks.
Ontologies are not a replacement for machine learning methods, but rather complement machine learning as another source for having your data annotated. Supervised machine learning has a direct dependence on having your data annotated. This annotation can come in many forms. Today, Google, Microsoft Bing, Yahoo and Yandex have partnered up to build an Internet ontology called schema.org based upon open Semantic Web specifications.
IBM Watson uses DBpedia and YAGO to enhance its knowledge base. The BBC and The New York Times as well as major libraries and museums around the world are using knowledge models that are based upon the Linked Open Data community project. The formalization of obtaining knowledge is a non-trivial task regardless which machine learning approach you use.
Will AI make the Internet smart? Someday, given that only 6 percent of the Internet domains have any proper tagging. What about your business? Developing a comprehensive road map for an AI solution requires understanding what you know and how much of your knowledge can be formalized either through large volume sets or by developing explicit knowledge models.
The Internet may not become smart any time soon. However, what about your competition? Can you afford to let your competition get the upper hand in developing AI solutions in your business domain? If so, then it may be too late.