Game On: How Unity Is Extending The Power Of Synthetic Data Beyond The Gaming Industry

Unity is pushing the boundaries of what synthetic data can do. In this article, Danny Lange, SVP of AI at Unity, explains how the company is empowering customers across many different industries to accelerate innovation. He also shares two pieces of advice on designing AI/ML simulation experiences.

istock 1080085940
iStock

In the gaming world, finding the balance between art and science is essential to getting to the desired outcome: fun. At Unity, we’ve been creating and operating real-time 3D (RT3D) content in gaming for more than 15 years. Our proprietary platform connects game players and game creators, letting developers build games that fans will enjoy. Now, we’re helping creators in other industries make better data-driven decisions using artificial intelligence (AI), machine learning (ML), and synthetic data (data that is derived from simulations based on real-world data).

Understanding synthetic data’s role in an already data-rich world

Getting the right data into the right place is what most companies struggle with as they get started with ML and AI. It can be difficult for a smaller company to generate or gather the amount of data that’s needed to effectively make accurate predictions. This is where synthetic data comes in—and why Unity has gained so much expertise in this area since our inception. 

My team’s work started four years ago. Our goal was to explore how else we could use the Unity engine and expertise in synthetic data. Since I’d worked at Uber, we started with the idea that synthetic data could speed up software development time for self-driving cars. The prevailing wisdom had been that you needed to log thousands of testing hours to create a reliable self-driving car. But approximately 98% of the time when a human is driving, nothing interesting is happening. The same goes for autonomous vehicle test drives, leading to hours of uneventful footage that didn’t offer any real value. 

Plus, it’s highly risky to put software that’s a work in progress on the road. But when you run your software in a simulation—or in Unity’s case, on a simulation run in an environment using our game engine—you can test-drive millions of miles every 24 hours, across thousands of servers, creating scenarios that would rarely occur. So, to build a self-driving car that has experienced thousands of hours of possible events, or a vacuum that can avoid bumping into furniture, or a robot that is able to do surgery—the Unity engine is a great proxy for the real world. 

Since that first self-driving car project, we’ve explored many ways to make it easier for creators to use ML modeling and predictive analytics as easily as game creators do. We’ve expanded the use of Unity to new uses such as retail spaces, public spaces, and transportation hubs, as well as robotics. As robots move away from doing repetitive tasks in manufacturing facilities, and transition into labs and households, they need very different skills. For example, smart vacuum cleaners now have cameras and other sensors, helping them understand the layout of a room. Developers working on robots might use Unity AI/ML to build a simulated version of the robot and run tens, hundreds, or thousands of simulated scenarios before ever running the physical robot in a physical space, saving an incredible amount of time.  

The gaming technology we’re bringing to other markets

A game player is constantly generating simulated data based on situations and movements. That type of spatial simulation is incredibly valuable for other scenarios, such as predicting what might happen under complex conditions. For example, combining the power of Unity’s 3D rendering capabilities with its simulation that can be scaled on the cloud to holistically study large and uncertain systems. This simulation allows big-picture observations and what-if studies, ultimately leading to a conceptual understanding of real-world situations that experts can use to inform challenging policy decisions.

The possibilities are boundless. We see retailers use data simulations to choose the best option to lay out clothing displays and to consider factors like whether shoppers are looking for themselves or a family member. Designing an airport terminal or deciding where in the terminal to place a particular store is easier and more informed using a tool like the Unity engine. Shop owners can run simulations using characters to find the optimal location for the store, taking into account factors like where and how many people could stand in line during peak times. In addition, it’s becoming easier to overlay location or other real-time datasets on simulations for even more specific testing. 

All of this requires a powerful cloud back end to achieve the required large scale and volume. Our customers need to run a large number of instances on demand, and usage is very spiky. So we built a cloud-based version of the Unity engine so it’s easy to run on many devices, and we offer it as a managed service using Google Cloud. Customers get the data they need without having to manage the back end.

All of our ML and data analytics at Unity runs on Google Cloud, using Compute Engine infrastructure and BigQuery for analytics. 

Related: See why Gartner named Google Cloud a leader in the 2020 Magic Quadrant for both Cloud Database Management Systems and for Cloud Infrastructure and Platform Services.

Too much of a good thing is…overwhelming

Through all the work I’ve been lucky enough to do in this constantly changing industry, there are a few common challenges I’ve encountered with those getting started with AI and ML. Here are two quick tips to remember as you’re getting started on your AI/ML journey.

1. Consider how much data you truly need 

More data is better, generally, but there’s a point of diminishing returns. Think about the data you’ll generate with simulations. In 24 hours, it’s possible to generate a thousand years’ worth of video of 30 frames per second. But how are you going to check what you generated? The time involved in figuring out whether you generated the right data, or got the same thing over and over again, doesn’t offer a lot of value and is very time intensive. 

You’ll find the right amount of data for your situation by evaluating and testing often. Work in an iterative loop: generate data, train the model, verify the model against real-world data, and see how it performs. Constantly measure the results, then define what “good enough” means for your situation. Then, create predictive models. That’s also the point where you’re likely establishing a strong data culture, where your internal users trust the data and depend on it to make better decisions. 

2. Simulated data is more fair than real-world data 

The real world is complicated, and it’s not always fair. For a lot of data scientists or ML teams, it’s easy to collect and use real-world data to train systems. But with data representing a world where 80% of software engineers are male, that real-world data can easily make its way into ML modeling. For example, the ML engine will learn that 80% of software engineers are male, based on real-world data, and then prefer male engineers over female engineers when building a model to identify engineers in photos. You have to take responsibility for generating data that represents the world as you would like to see it. With simulated data, you can generate equal amounts of people by gender. Make the system equally good at recognizing children and adults. Generate different types of hair, a whole range of skin colors, and varying physical abilities. 

As an engineer, you have the power to generate data that reflects a more balanced, diverse, and fair world. I’ve seen people wipe their hands of this notion by simply saying that the data returned these results, like in the example above with male vs. female engineers. But the system creates data based on human inputs. If you use unfair data, it will be amplified and can end up harming your brand.

It’s also important to note that personal data isn’t necessary to do innovative work with ML, AI, and data analytics. For example, in our engine, gender, age, or other personal details aren’t relevant to gaming. All that matters is how you play the game. This principle can, and should, apply to other applications. 

Building a future with simulated data

We’re already seeing some really impressive results in our work with customers, and there’s so much more potential as these new technologies come together. When you use reinforcement learning along with spatial simulations, you might get a robot with vision capabilities that learns on the fly and can do what used to be a human’s busy work. We’re seeing smarter indoor environments, such as cashierless grocery stories, and there’s lots more to come as we explore new industries and applications. Exploring what’s possible, and connecting creators with the data they need to make the right decisions continues to drive Unity, no matter the industry.

Continue to explore what’s possible: Assess where you are in the AI journey and get a framework for creating and evolving AI capabilities within your organization. Download Google Cloud’s AI adoption framework.  

About the author

Danny Lange is Senior Vice President of Artificial Intelligence at Unity where he leads the company’s initiatives in the field of applied Artificial Intelligence. Prior to his role at Unity, Danny was the head of Machine Learning at Uber where he led the development of the company’s Machine Learning platform. Previously, he was General Manager for Machine Learning at Amazon where he managed Amazon’s internal Machine Learning platform as well as launched the first AI product for Amazon Web Services (AWS) known as Amazon Machine Learning. Danny has also lead Machine Learning efforts at Microsoft and started his career building autonomous agents as a Computer Scientist at IBM Research.

Copyright © 2020 IDG Communications, Inc.