by Mitch De Felice

Big data and machine learning – is the glass half empty?

Opinion
Apr 01, 2016
AnalyticsBig DataBusiness Intelligence

Artificial intelligence (AI) is receiving much attention in light of big data and machine learning accomplishments. However, machine learning only represents half of the AI story.

Artificial intelligence is currently making a resurgence since the 1990s. Today, the focus is on machine learning and statistical algorithms. This shift has served AI well. Machine learning and statistics provide effective algorithm solutions to certain kinds of problems, such as board games, spam detection, voice and image recognition, etc.

How is AI different today from 20 years ago? AI 20 years ago was focused on what is known as logic based AI or Knowledge Representation (KR). As any emerging technology, it became overhyped and over promised. The tools and frameworks to make KR successful never really materialized until recently.

As a technology decision maker, all the vocabulary of artificial intelligence might be a bit overwhelming. In Figure 1, starting from the bottom going up illustrates knowledge acquisition capabilities from a data usage perspective. By no means does this represent all the approaches to achieving an AI solution, but rather it illustrates how big data fits into the AI picture.

Knowledge Acquisition

Figure 1 Capabilities of AI in context by data types

Machine learning is represented by the right side of the above diagram, labeled, “Statistical Reasoning.” There are two types of machine learning, unsupervised and supervised. When big data vendors speak of machine learning, they are usually speaking of supervised machine learning that has existed since the 1950s.

Subsequently, supervised machine learning requires all the data to be annotated (that is metadata tagged), it is an ideal solution for problems that use structured data and its columns. Whereas, unsupervised learning doesn’t require the data to be annotated, but rather uses features of the data (that is patterns). Unsupervised learning excels in areas as image, voice, and hand written recognition, where any data features can be identified.

Both supervised and unsupervised machine learning uses a parallel distributed processing framework called neural networks. Neural networks are responsible for a number of recent breakthroughs in board games, audio and image recognition.

Hype vs. Promise

In Pulling the plug on the AI hype, I made the case that the current incarnation of AI using machine learning doesn’t handle every day commonsense rational challenges, like reading a news headline. In fact, if you look at IBM Watson, it really is a factoid AI solution. Watson of today doesn’t handle open ended questions. Take the example, “What happens to a glass of water if you let go of it?” The appropriate response would be, “Where is the glass? Is it sitting on a table, or are you holding it in your hand?” Cognitive Computing needs to be able to reason, and to do that it needs to understand concepts like space, time, inertia, and mass to name a few.

This is why big data will have a difficult time delivering on the promise of rich discoveries, especially when it comes to unstructured data. Structured data columns don’t always correspond directly to business concepts. Big data doesn’t use elaborate conceptual relationships. Instead, big data uses a simple metadata table to provide term lookups. The best big data can offer is the ability to classify, falling short on being able to provide any contextualization capabilities, which is the real value to the business.

Ontologies are represented by the left side of the above diagram, labeled “Logic Reasoning.” Ontologies offer the ability to model elaborate conceptual relationships. This approach is known as declarative modeling and allow us to model real world problems. As an example, a use case defines an actor (Subject) performing some business function (Predicate) against some entity or system (Object). If subject, predicate and object sounds familiar, it should, this is the foundation of good sentence structure. Developing an ontology by adopting a declarative model approach will go a long way towards providing contextual insights to your big data dilemma.

The reality is that technology is shifting faster than leadership can understand what threats it poses. Large bureaucratic companies that simply adopt the big data paradigm will struggle to keep up with explosion of unstructured data and data streams that has caused the role and value of information to drastically change.

The greatest competitive advantage will go to those technology leaders that understand that both sides of the cognitive computing equation are needed. Implementing solutions with this basic understanding will allow decision makers to adjust to any new threats not within months or years, rather in weeks providing the capability to leap frog their competition.