Natural language processing definition
Natural language processing (NLP) is the branch of artificial intelligence (AI) that deals with training a computer to understand, process, and generate language. Search engines, machine translation services, and voice assistants are all powered by the technology.
While the term originally referred to a system’s ability to read, it’s since become a colloquialism for all computational linguistics. Subcategories include natural language generation (NLG) — a computer’s ability to create communication of its own — and natural language understanding (NLU) — the ability to understand slang, mispronunciations, misspellings, and other variants in language.
How natural language processing works
NLP works through machine learning (ML). Machine learning systems store words and the ways they come together just like any other form of data. Phrases, sentences, and sometimes entire books are fed into ML engines where they’re processed using grammatical rules, people’s real-life linguistic habits, or both. The computer then uses this data to find patterns and extrapolate what comes next. Take translation software, for example: In French, “I’m going to the park” is “Je vais au parc,” so machine learning predicts that “I’m going to the store” will also begin with “Je vais au.” All the computer needs after that is the word for “store.”
Machine translation is a powerful NLP application, but search is the most used. Every time you look something up in Google or Bing, you’re feeding data into the system. When you click on a search result, the system interprets it as confirmation that the results it has found are correct and uses this information to better search in the future.
Chatbots work the same way: They integrate with Slack, Microsoft Messenger, and other chat programs where they read the language you use, then turn on when you type in a trigger phrase. Voice assistants such as Siri and Alexa also kick into gear when they hear phrases like “Hey, Alexa.” That’s why critics say these programs are always listening: If they weren’t, they’d never know when you need them. Unless you turn an app on manually, NLP programs must operate in the background, waiting for that phrase.
Natural language processing examples
Data comes in many forms, but the largest untapped pool of data consists of text. Patents, product specifications, academic publications, market research, news, not to mention social media feeds, all have text as a primary component and the volume of text is constantly growing. Apply the technology to voice and the pool gets even larger. Here are three examples of how organizations are putting the technology to work:
- Accenture uses it to analyze contracts: The company’s Accenture Legal Intelligent Contract Exploration (ALICE) tool helps the global services firm’s legal organization of 2,800 professionals perform text searches across its million-plus contracts, including searches for contract clauses. ALICE uses “word embedding” to go through contract documents paragraph by paragraph, looking for keywords to determine whether the paragraph relates to a particular contract clause type.
- Verizon processes customer requests: Verizon’s Business Service Assurance group uses NLP and deep learning to automate the processing of customer request comments. The group receives more than 100,000 inbound requests per month. Its AI-Enabled Digital Worker for Service Assurance reads repair tickets and automatically responds to the most common requests, such as reporting on current ticket status or repair progress updates. More complex issues are routed to human engineers.
- Public Service Energy & Gas (PSE&G) helps customers with virtual assistant: The New Jersey public utility uses virtual assistant technology and other digital services to enable its customers to manage their electricity or gas accounts via voice commands. It was built using the Alexa Skills Kit provided by Amazon.
Natural language processing software
Whether you’re building a chatbot, voice assistant, predictive text application, or other application with NLP at its core, you’ll need tools to help you do it. According to Technology Evaluation Centers, the most popular software includes:
- Natural Language Toolkit (NLTK). NLTK is an open-source framework for building Python programs to work with human language data. It was developed in the Department of Computer and Information Science at the University of Pennsylvania and provides interfaces to more than 50 corpora and lexical resources, a suite of text processing libraries, wrappers for natural language processing libraries, and a discussion forum. NLTK is offered under the Apache 2.0 license.
- SpaCy. SpaCy is an open-source library for advanced natural language processing explicitly designed for production use rather than research. SpaCy was made with high-level data science in mind and allows deep data mining. It’s licensed by MIT.
- Gensim. Gensim is an open-source Python library. The platform-independent library supports scalable statistical semantics, analysis of plain-text documents for semantic structure, and the ability to retrieve semantically similar documents. It’s intended to handle large amounts of text without human supervision.
- Amazon Comprehend. This Amazon service doesn’t require machine learning experience. It’s intended to help organizations find insights from email, customer reviews, social media, support tickets, and other text. It uses sentiment analysis, part-of-speech extraction, and tokenization to parse the intention behind the words.
- IBM Watson Tone Analyzer. This cloud-based solution is intended for social listening, chatbot integration, and customer service monitoring. It can analyze emotion and tone in customer posts and monitor customer service calls and chat conversations.
- Google Cloud Translation. This API uses NLP to examine a source text to determine language and then use neural machine translation to dynamically translate the text into another language. The API allows users to integrate the functionality into their own programs.
Natural language processing courses
There are many resources available for learning to create and maintain NLP applications and a number of them are free. They include:
- Introduction to Natural Language Processing in Python from DataCamp. This free course, offered as 15 videos and 51 exercises, covers the basics of NLP using Python, including how to identify and separate words, how to extract topics in a text, and how to build your own fake news classifier.
- Introduction to Natural Language Processing (NLP) from Udemy. This introductory course provides hands-on experience working with and analyzing text using Python and the Natural Language Toolkit. It consists of three hours of on-demand video, three articles, and 16 downloadable resources. The course costs $19.99, which includes a certificate of completion.
- Hands On Natural Language Processing (NLP) using Python from Udemy. This course is for individuals with basic programming experience in any language, an understanding of object-oriented programming concepts, knowledge of basic to intermediate mathematics, and knowledge of matrix operations. It is completely project-based and involves building a text classifier for predicting sentiment of tweets in real time, and an article summarizer that can fetch articles and find the summary. The course consists of 10.5 hours of on-demand video and eight articles. The course costs $19.99, which includes a certificate of completion.
- Natural Language Processing (NLP) from edX. This six-week course, offered by Microsoft through edX, provides an overview of natural language processing and the use of classic machine learning methods. It covers statistical machine translation and deep semantic similarity models (DSSM) and their applications. It also covers deep reinforcement learning techniques applied in NLP and vision-language multimodal intelligence. It’s an advanced-level course and those who complete it can pursue a Verified Certificate for $99.
- Natural Language Processing from Coursera. Part of Coursera’s Advanced Machine Learning Specialization, this course covers natural language processing tasks including sentiment analysis, summarization, dialogue state tracking, and more. Coursera says it is an advanced level course and estimates it will take five weeks of study at four to five hours per week to complete.
- Natural Language Processing in TensorFlow by Coursera. This course is part of Coursera’s TensorFlow in Practice Specialization, and it covers using TensorFlow to build natural language processing systems that can process text and input sentences into a neural network. Coursera says it is an intermediate-level course and estimates it will take four weeks of study at four to five hours per week to complete.
Here are some of the most popular job titles related to NLP and the average salary for each position, according to data from PayScale.
- Computational linguist: $60K-$110K
- Data scientist: $76K-133K
- Data science director: $122K-$216k
- Lead data scientist: $107K-$165K
- Machine learning engineer: $78K-$156K
- Senior data scientist: $105K-$167K
- Software engineer: $78K-$144K