Recent improvements in natural language processing (NLP) are bolstering mainstream technologies with speech and text capabilities, whether that’s reading emails aloud in a natural-sounding voice or Excel enabling you to type in questions about your spreadsheet data and get answers in the form of auto-generated charts and PivotTables.
As NLP becomes more accurate and more widely available, it has the potential to move from powering customer support chatbots on preset topics to handling qualitative, semi-structured and unstructured data. Finally delivering on the promise of knowledge mining could unlock information about company processes, assets and liabilities to create better workflows and a more real-time view of the organization.
“NLP breaks down words into their simplest form and identifies patterns, rules and relationships between them,” explains Walt Kristick, senior vice president of applied and advanced technology at apexanalytix. “It uses computer algorithms to parse and interpret written and spoken natural language to allow systems to learn and understand human languages.”
NLP uses range from translation and language generation (for summaries, annotation, or even explaining other machine learning models), to classification and clustering, sentiment analysis and other information extraction. The simplest forms of NLP are already widely used, Kristick points out: spell-checking, suggested email and messaging responses, and virtual assistants such as Siri all use NLP, as do chatbots.
“There is a growing demand for the ability to analyze and extract meaning for text and non-related data sources, especially in the healthcare and life sciences markets,” Kristick notes.
Here is a look at the present state of NLP and where it might fit into your organization.
NLP services predominate
While there are many algorithms for constructing your own NLP tasks along with frameworks such as Python NLTK, Sanford CoreNLP and Apache OpenNLP, the most effective models are extremely large. At the time of writing, Microsoft’s 17-billion parameter Turing natural language generation model was the largest ever published and BERT and GPT-2 also have billions of parameters.
“Just taking these models off the shelf doesn’t work for some of the sophisticated things companies need to do,” warns Lili Cheng, corporate vice president of conversational AI at Microsoft. “For many companies, it would be very challenging to host these big models and manage them and do all of that work. Some people want to do that but we believe a lot more customers will just want to customize and to add their own information,” Cheng says, noting that for many organizations hiring NLP experts is challenging.
Even organizations with in-house AI expertise often turn to NLP services from providers such as Microsoft, Amazon, Google and IBM, to enable expert developers and business users to capitalize on the technology as well.
Telefonica, a Microsoft customer with its own AI group, is using Microsoft’s Power platform to enable business users with no developer expertise to create their own tools with services such as Q&A Maker. “You point it to a PDF file or web page FAQ and build the knowledge base out of all these sources, to let people ask questions and answers, either to search or to have a conversational experience,” Cheng says.
Here, chatbots are a key NLP use. Chatbots can take orders, supply answers from FAQs, route inquiries, book meetings and hand off conversations to humans when necessary. NLP is also a powerful tool for gaining customer insight from the growing volume of text and voice data companies already have, says Paul Quinn, senior director of product management at Confirmit.
“Businesses often now hold over 100 terabytes of unstructured data — everything from call center notes and customer emails to comments in surveys,” he says. “Any business aiming to improve their customer experience or deliver more detailed insight about their brand can use NLP to sift through huge quantities of data and find the nuggets of useful data hiding within.”
But it’s not just retail and other customer-facing industries that can benefit from NLP, says Dakshi Agrawal, IBM Fellow and chief architect for AI. Any company dealing with clients can leverage NLP to gain insights from their interactions, Agrawal says, adding that “many companies are using the technologies for internal employee and general HR engagements as much as for external client and partner engagements.”
For example, topic clustering, which uses NLP techniques like sentence embedding rather than just keyword extraction, is more accurate at grouping issues that customers may report using different terms. Highlighting those clusters in a dashboard can help reveal trending issues or repeated problems.
Signoi aims to tackle open-ended comments in surveys by surfacing frequently used words, highlighting positive and negative terms and aggregating them by demographic group. The independent UK transport user watchdog Transport Focus used Signoi to see the biggest concerns for commuters and leisure passengers on various train services. Business travellers were angry about overcrowding on one line; those taking the train for leisure wanted better car parking and more space for luggage and bikes.
NLP can be used to generate language to explain results. Microsoft’s Power BI business analytics service and Salesforce.com’s Tableau each offer features that enable users to type in questions about their data and get charts or automated analysis in response.
Knowing what the business knows
NLP has a lot of potential to help extract what an organization doesn’t know it already knows.
Specialized AI-powered tools such as ABBYY Text Analytics for Contracts, Exigent Contract Management Solution or Seal Contract Discovery and Analytics extract terms and deadlines from contracts that can help organizations understand what they’ve committed to. Docugami, the new startup from XML co-inventor Jean Paoli, aims to do that for less structured documents.
“Only 15 percent of the data in an enterprise is stored in databases. We all communicate using text and emails and documents. The truth is not in those lovely structured databases. The truth is in the documents,” Paoli told us.
“Take a business that’s very document-intensive, like commercial real estate. A frontline business user spends their time creating 15 lease agreements a week and every Monday their manager asks, ‘What did you do? What are the effective close dates? Did you negotiate parking? Do they want us to maintain the property or not?’ Once you sign a document those are the terms the company has to deliver on, but that information is buried in the documents,” Paoli says.
Unlocking this “dark data” could replace the Monday morning status meeting, and improve business agility; something Paoli points out is more important than ever now, whether it’s landlords asked by Starbucks to renegotiate their lease agreement or restaurants needing to understand what their insurance policy says.
“It’s even more important at this point to use NLP to analyze your business documents because businesses are rethinking their business model. They may have to renegotiate everything, and they need to understand what are their obligations and their risks,” Paoli says. Professional services firm Accenture is doing just that, applying its own NLP to analyze more than a million contracts to understand its commitments and liabilities.
For organizations without their own in-house NLP expertise, Docugami’s SaaS offering works work with 30 example documents, which it can select itself from a folder of business documents, and 30 minutes of feedback from the business users who create the documents, to train a model, according to Paoli.
Docugami then puts the information into a database to help create a dashboard you can see in a browser, or integrate with Excel or Tableau. “We can say look, this is expiring or all these documents have this particular clause, except that one,” Paoli says.
Extracting useful information from meetings and conversations is a laborious manual process. Some company calls are already transcribed for regulatory compliance, but they’re rarely analyzed. How much could businesses learn about the progress of projects or upcoming deadlines from what’s said in meetings?
With enterprise employees typically spending 30 percent or more of their time in meetings, a lot of the information from those meetings isn’t captured in a meaningful way the way other business data is, Otter CEO Sam Liang points out.
“How do people stay on the same page, especially now when you’re having Zoom meetings back to back?” Liang says.
Transcription tools such as Otter could help with that. Live captioning in PowerPoint presentations and Teams meetings, or the searchable live meeting transcription in the Azure Streams broadcast platform, can also prove productive in providing a transcript to continue conversations later without relying on someone manually taking notes.
In the future, Microsoft’s Cheng suggests, platforms will use transcription and document analysis alongside image recognition to extract the “collective intelligence of a meeting” so it’s easy to access as the group continues working after the meeting. “There’s the opportunity to document more of what’s happening, and then make that easier to share with your team,” she says.
Bridgewater Associates, for example, has recorded all internal meetings for the past 15 years and any employee can watch the recordings. Because they were hard to search they were seldom viewed and the company is now using Otter to extract the contents of old meetings.
Similarly, the Azure Cognitive Services speech to text API that powers the Azure Media Services live meeting transcription will soon transcribe audio files uploaded to OneDrive. Developers can already build transcription apps using those APIs but having the functionality built directly into platforms makes this much more widely accessible.
Analytics and accuracy
Full transcriptions aren’t always the most useful result of applying NLP, although they can provide a timeline to put what Cheng calls the “interesting nuggets” you find by searching in context.
Otter extracts tags as an automatic summary to indicate what’s covered in the text. Automatically written document summaries are coming to tools such as IBM Watson Natural Language Understanding and Otter is working on something similar, but you still have to remember to consult the transcript. In 2018 Microsoft showed a prototype system for Teams that created and assigned action items from meeting transcripts, as well as distributing meeting notes to participants.
In the longer term, NLP could provide meetings analytics: whether the same topics keep coming up, whether the same deadline keeps getting pushed back, whether some employees talk more than others — or talk over people.
The value of all this depends on the accuracy of transcription, and accuracy is a complex thing to measure for NLP in general. There are formal benchmarks on which many NLP systems achieve human parity, but they’re mostly based on dialog and may not give you an accurate comparison with what you want to do. There’s no single useful measure, Cheng points out.
“We’re seeing people blend together capabilities into multimodal systems. You might find that your dialog system is really great but then it doesn’t really do a good job with search or blended systems where you have speech and language and vision and documents that you want to bring together,” she says.
Transcription accuracy varies with the quality of the recording, the amount of background noise, the accents of the speakers and what people are talking about. For a native English speaker in a quiet environment, Otter’s Liang says it’s more than 95 percent accurate. In practice, you get transcriptions that are useful but some way from perfect.
Whatever NLP tools you use, you should be prepared to invest time in customizing vocabulary for the concepts and connections that matter to your business, such as technical terms for your industry or your own product names, as well as employee names so they’re recognized correctly.
Organizations need to be aware of what is an acceptable level of error for them before using NLP for more than shortcuts or discovery, but Cheng suggests focusing on the end-to-end experience.
“How did you put these together to make something that people actually use, that helps your company or helps your customers do something more effectively?” she asks.
“You don’t want to over promise; AI is not magic but there are so many things natural language tools can improve. The biggest problems your company has today are probably organizing your information, getting more getting more out of the documents that you have and letting people who have the expertise guide that,” Cheng says. “The experience we’re in right now with a lot of people working remotely, we can make better with AI.”