How machine learning can be used to break down language barriers

A look at how machine learning is currently being applied to the language barrier problem, and how it might develop in the future.

speech recognition - AI virtual assistant - language translation
Thinkstock

Machine learning has transformed major aspects of the modern world with great success. Self-driving cars, intelligent virtual assistants on smartphones, and cybersecurity automation are all examples of how far the technology has come.

But of all the applications of machine learning, few have the potential to so radically shape our economy as language translation. The content of language translation is the perfect model for machine learning to tackle. Language operates on a set of predictable rules, but with a degree of variation that makes it difficult for humans to interpret. Machine learning, on the other hand, can leverage repetition, pattern recognition, and vast databases to translate faster than humans can.

There are other compelling reasons that indicate language will be one of the most important applications of machine learning. To begin with, there are over 6,500 spoken languages in the world, and many of the more obscure ones are spoken by poorer demographics who are frequently isolated from the global economy. Removing language barriers through technology connects more communities to global marketplaces. More people speak Mandarin Chinese than any other language in the world, making China’s growing middle class is a prime market for U.S. companies if they can overcome the language barrier.

Let’s take a look at how machine learning is currently being applied to the language barrier problem, and how it might develop in the future.

Neural machine translation

Recently, language translation took an enormous leap forward with the emergence of a new machine translation technology called Neural Machine Translation (NMT). The emphasis should be on the “neural” component because the inner workings of the technology really do mimic the human mind. The architects behind NMT will tell you that they frequently struggle to understand how it comes to certain translations because of how quickly and accurately it delivers them.

 “NMT can do what other machine translation methods have not done before – it achieves translation of entire sentences without losing meaning,” says Denis A. Gachot, CEO of SYSTRAN, a language translation technologies company. “This technology is of a caliber that deserves the attention of everyone in the field. It can translate at near-human levels of accuracy and can translate massive volumes of information exponentially faster than we can operate.”

The comparison to human translators is not a stretch anymore. Unlike the days of garbled Google Translate results, which continue to feed late night comedy sketches, NMT is producing results that rival those of humans. In fact, Systran’s Pure Neural Machine Translation product was preferred over human translators 41% of the time in one test.

Martin Volk, a professor at the Institute of Computational Linguistics at the University of Zurich, had this to say about neural machine translation in a 2017 Slator article:

 “I think that as computing power inevitably increases, and neural learning mechanisms improve, machine translation quality will gradually approach the quality of a professional human translator over the coming two decades. There will be a point where in commercial translation there will no longer be a need for a professional human translator.”

Gisting to fluency

One telling metric to watch is gisting vs. fluency. Are the translations being produced communicating the gist of an idea, or fluently communicating details?

Previous iterations of language translation technology only achieved the level of gisting. These translations required extensive human support to be usable. NMT successfully pushes beyond gisting and communicates fluently. Now, with little to no human support, usable translations can be processed at the same level of quality as those produced by humans. Sometimes, the NMT translations are even superior. 

Quality and accuracy are the main priorities of any translation effort. Any basic translation software can quickly spit out its best rendition of a body of text. To parse information correctly and deliver a fluent translation requires a whole different set of competencies. Volk also said, “Speed is not the key. We want to drill down on how information from sentences preceding and following the one being translated can be used to improve the translation.”

This opens up enormous possibilities for global commerce. Massive volumes of information traverse the globe every second, and quite a bit of that data needs to be translated into two or more languages. That is why successfully automating translation is so critical. Tasks like e-discovery, compliance, or any other business processes that rely on document accuracy can be accelerated exponentially with NMT.

Education, e-commerce, travel, diplomacy, and even international security work can be radically changed by the ability to communicate in your native language with people from around the globe.

Post language economy

Everywhere you look, language barriers are a speed check on global commerce. Whether that commerce involves government agencies approving business applications, customs checkpoints, massive document sharing, or e-commerce, fast and effective translation are essential.

If we look at language strictly as a means of sharing ideas and coordinating, it is somewhat inefficient. It is linear and has a lot of rules that make it difficult to use. Meaning can be obfuscated easily, and not everyone is equally proficient at using it. But the biggest drawback to language is simply that not everyone speaks the same one.

NMT has the potential to reduce and eventually eradicate that problem.

“You can think of NMT as part of your international go-to-market strategy,” writes Gachot. “In theory, the Internet erased geographical barriers and allowed players of all sizes from all places to compete in what we often call a ‘global economy,’ But we’re not all global competitors because not all of us can communicate in the 26 languages that have 50 million or more speakers. NMT removes language barriers, enabling new and existing players to be global communicators, and thus real global competitors. We’re living in the post-internet economy, and we’re stepping into the post-language economy.”

Machine learning has made substantial progress but has not yet cracked the code on language. It does have its shortcomings, namely when it faces slang, idioms, obscure dialects of prominent languages and creative or colorful writing. It shines, however, in the world of business, where jargon is defined and intentional. That in itself is a significant leap forward.

This article is published as part of the IDG Contributor Network. Want to Join?

SUBSCRIBE! Get the best of CIO delivered to your email inbox.