Google has built a small neural net for its real-time visual translation app to work effectively on smartphones, which don’t have the high intense computing power of data centres to carry out image recognition and translation. The app enables users to point their camera an object that contains words so they can translate things like menus and signs. The search giant also added 20 languages to its app. “We want to be able to recognise a letter with a small amount of rotation, but not too much. If we overdo the rotation, the neural network will use too much of its information density on unimportant things. So we put effort into making tools that would give us a fast iteration time and good visualisations,” Otavio Good, software engineer for Google Translate, wrote in a blog post. “Inside of a few minutes, we can change the algorithms for generating training data, generate it, retrain, and visualise. “To achieve real-time, we also heavily optimized and hand-tuned the math operations. That meant using the mobile processor’s SIMD instructions and tuning things like matrix multiplies to fit processing into all levels of cache memory.” The app filters out background objects when reading letters in images, such as people, trees, cars, and so on. By looking at “blobs of pixels” with similar colour and are in near proximity to each other, the app recognises it as a continuous line of text to read. The app has been trained using a convolutional neural network to learn what different letters in languages look like and differentiate letters from non-letters. A letter generator was also built to create noise around the letters or characters being translated such as smudges and rotation so that the app does not need to always have clear, well-presented text in order to work. The app uses dictionary lookups for the different languages once the letters are recognised, with it still being able to recognise words from a group of letters if it accidentally reads one letter as a number. For example, if it reads ‘S’ as ‘5’ by mistake, it will still be able to recognise the word from the following letters, ‘super’. The translation is then rendered on top of the original words. “We can do this because we’ve already found and read the letters in the image, so we know exactly where they are. We can look at the colours surrounding the letters and use that to erase the original letters. And then we can draw the translation on top using the original foreground colour.” Related content feature Red Hat embraces hybrid cloud for internal IT The maker of OpenShift has leveraged its own open container offering to migrate business-critical apps to AWS as part of a strategy to move beyond facilitating hybrid cloud for others and capitalize on the model for itself. By Paula Rooney May 29, 2023 5 mins CIO 100 Technology Industry Hybrid Cloud feature 10 most popular IT certifications for 2023 Certifications are a great way to show employers you have the right IT skills and specializations for the job. These 10 certs are the ones IT pros are most likely to pursue, according to data from Dice. By Sarah K. White May 26, 2023 8 mins Certifications Careers interview Stepping up to the challenge of a global conglomerate CIO role Dr. Amrut Urkude became CIO of Reliance Polyester after his company was acquired by Reliance Industries. He discusses challenges IT leaders face while transitioning from a small company to a large multinational enterprise, and how to overcome them. By Yashvendra Singh May 26, 2023 7 mins Digital Transformation Careers brandpost With the new financial year looming, now is a good time to review your Microsoft 365 licenses By Veronica Lew May 25, 2023 5 mins Lenovo Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe