BrandPosts are written and edited by members of our sponsor community. BrandPosts create an opportunity for an individual sponsor to provide insight and commentary from their point-of-view directly to our audience. The editorial team does not participate in the writing or editing of BrandPosts.
By Lucas Wilson, Ph.D.
Everyone agrees that implementing artificial intelligence (AI) will be critical for business competitiveness going forward. But what exactly does that look like? While examples of AI for image processing and classification abound, not every business will be using images or video as the core of their AI efforts. For some, AI that helps to improve the customer experience through voice may be the key to unlocking value.
Natural language processing, or NLP, is an area of research in the AI community that seeks to find efficient ways of using computers to translate languages, convert voice to text and back again, and create human-like conversational agents to help customers deal with issues, questions and concerns. This field has been transformed by the shift from statistical machine learning methods to the use of neural networks and deep learning, making it possible to build automated systems that can interact with customers or employees more naturally than ever before.
So, how can implementing NLP transform your business? I’ll briefly talk about advances in language translation, voice to text conversion and conversational agents that can be linked together to build fully-automated support systems, which can listen to customers and can respond as though a support agent were sitting on the other end of the telephone.
Neural machine translation
Neural machine translation, or NMT, is the newest technique for mechanical translation of human languages. While prior methods used established statistical relationships between common words, NMT relies completely on example sentences written in both the source and the target language. This approach, which uses some of the most recent advances in neural network architecture research, has been able to outperform prior statistical models for a host of various language translation pairs. Training these types of networks requires significant computational horsepower. A laptop might be able to train a high-quality translation model in one to two months. That’s right: MONTHS! That’s an eternity to wait for something that may or may not be ready to deploy in production, which is the only time a machine learning model can generate value for your business.
Thankfully, the right hardware, coupled with software and algorithmic improvements, can reduce this time to value by orders of magnitude. Dell EMC’s AI Engineering team, part of the Dell EMC HPC and AI Innovation Lab, has worked with partners from Uber, Amazon and Intel to make these very improvements. We’ve reduced the time to train these types of models from months on a workstation to hours in the data center. This means a company’s software teams could potentially be operationalizing a machine learning model tomorrow, instead of next quarter.
Figure 1: Recent improvements in parallel training of translation models, from teams such as ours at Dell EMC, have vastly improved time to value.
And improving time-to-solution means your data science team has more potential to explore new ways of making these models more accurate. Or, perhaps, faster training might even give them time to train models for many language pairs. If you happen to be a global business, imagine the value in having translation models that accept language from any of your customers, then translate it into a single language for your customer service team.
Then imagine having the corresponding models necessary to translate information back into any language your customers happen to speak. These models could allow you to create your very own universal translator, ready to help you improve the customer support experience while streamlining your support structure. That’s not going to be possible if it takes your data scientists weeks or months to build a single translation model. It’s only by taking advantage of data center-scale compute, coupled with highly optimized software, that a business could make such a significant transformation.
Voice-to-text and text-to-voice
If you’ve ever used Amazon Alexa, Google Assistant or Microsoft Cortana, then you have some familiarity with models that can convert your voice into text, and the corresponding models that convert text into the assistant’s voice. Together, they create one of the newest expressions of the state-of-the-art in artificial intelligence. Like what we all imagined the future would be like when we watched episodes of Star Trek: The Next Generation, these voice response systems make us feel like we can make our smart devices, automobiles and even homes respond to our every command. And, while they still have some room for improvement, no one can deny that the ability to have a computer system respond to your voice, using a gadget that costs less than $100, is transformational.
While Google’s and Amazon’s voice models are impressive, they are also proprietary. Not all businesses are willing or able to take advantage of these “AI as a service” voice models. Perhaps you have regulatory requirements that demand that your data remain private, or you simply don’t want to tell the service providers exactly what your customers are asking you, or what your responses to your customers would be. That’s when it might be worthwhile to generate your own voice models.
Fortunately, we know the current state-of-the-art techniques for producing these sorts of models. They’re called WaveNets, and they are a generative type of neural network that feeds outputs back into the neural network to create a wave pattern that corresponds to a voice. Like the propagation of a wave, these networks build iteratively, flowing from one side to the other, until a complete waveform of the target voice has been produced.
Figure 2: WaveNets allow for conversion of voice data into corresponding text and vice versa.
The near-magical ability of these techniques to produce human voice, using only a handful of audio fragments from a voice actor, is extraordinary. In fact, it isn’t a stretch for models created using this technique to generate waveforms that can emulate any voice speaking in any language, as long as there is an effective pronunciation model.
Most of us have some familiarity with conversational agents, or chatbots. Ever started a customer support inquiry on a website? Ever called your bank and been greeted by an automated system that asks you questions about why you are calling? How about using Google Assistant or Amazon Alexa to interface with your calendar? Each time, you’ve been interacting with a chatbot. They are pervasive in modern society.
Here’s a case in point: Within a year of Facebook opening its Messenger platform to conversational agents, they had exceeded 100,000 different chatbots being deployed on the platform.1 And we can expect only more of the same in the months ahead. Gartner predicts that, by 2020, 50 percent of analytical queries will be generated via search, natural language processing or voice, or will be automatically generated. 2
Putting it all together: a fully-automated global support call center
Now, let’s put all of these advances together to transform customer support around the world. For a global enterprise, or even a small company with aspirations to offer its products globally, it wouldn’t be practical or cost-effective to create chatbots in dozens of languages. The development and maintenance of all those distinct chatbots would require heavy financial investments and the resources of dedicated teams around the world.
With today’s new and emerging technologies, there is a better way forward. That way is to create and maintain a single chatbot in a single language and add AI-driven translation capabilities at the edge. In this new world, when a customer calls the support desk with a query, he or she can ask a question in any supported language. On the back-end, the automated support system translates the customer words and formulates a response. The system then translates the response back into the customer’s language and uses text-to-voice capabilities to reply to the customer.
Figure 3: A potential universal support chatbot, receiving requests translated from customers in many languages
The benefits here are obvious. With this level of automation in place, a global support center could handle far more requests than would be possible with today’s systems, and a company could automate more support functions without building and maintaining chatbots in multiple languages. And instead of having armies of customer support agents all over the world, a company could maintain a single global support center with armies of servers around the world.
Thinking beyond images
One takeaway here is that the potential use cases for AI extend far beyond today’s focus on image-based applications, such as the identification of tumors in radiology scans, or the sorting of images on a social media site. While image-based use cases are important to many organizations, NLP applications could be far more valuable to businesses that need to interact with customers in many languages.
Chatbots are now pervasive, but they are tied to a single language, so a global company that wants to use chatbots must wrestle with the challenges of developing and maintaining many different chatbots in many different languages. A fully automated global support center with a single chatbot and translation at the edge overcomes these challenges.
One other important takeaway is that NLP systems allow customers to communicate with companies using spoken language, rather than having a text-based conversation at a keyboard. With today’s technologies, businesses can communicate with customers in conversational spoken language without maintaining legions of local support specialists and chatbots written in many different languages.