by Thor Olavsrud

DreamWorks keeps production on track with AIOps

Sep 21, 2020
Artificial IntelligenceData ManagementDigital Transformation

A combination of business continuity planning, predictive analytics, and a multi-tenant cloud architecture has allowed the animation studio to keep producing movies without missing a beat.

artificial intelligence concept picture id1160995648
Credit: iStock

DreamWorks Animation is in many ways a manufacturer of digital data. The films it produces comprise multiple terabytes of data, created by teams of artists working together with sophisticated digital animation tools in a complex data pipeline. When the COVID-19 pandemic hit and that animation factory floor had to close, DreamWorks production was able to keep chugging along due to business continuity planning, analytics, and a multi-tenant cloud architecture.

“You watch our movies as data, either through a streamer or a digital projector in a theater,” says Skottie Miller, technology fellow and vice president of platform and services architecture at DreamWorks. “Because of the multi-tenant, work-from-anywhere-on-campus environment, when the pandemic hit, what really changed for us? We still operate like a multi-tenant cloud. We still have our data in Glendale, in Las Vegas. And the people just went to offices, their houses, that were further away.”

DreamWorks released its latest film, Trolls World Tour, in early April as much of the U.S. was going into lockdown. The movie required 1,200TB of storage and the creative teams managed and accessed 500 million digital files as they worked on the movie. DreamWorks starts production of movies with similar requirements roughly every four to six months and the movies take between 2.5 to 3 years to complete.

DreamWorks IT performs all the normal IT functions, such as supporting payroll and HR, managing systems and so forth, but its most important responsibility is supporting the digital production facility.

Artists at DreamWorks use sophisticated tools, many of which the company wrote, or in some cases purchased from vendors and modified heavily. The artists use those tools to create data. There’s also a secondary tier of metadata collected about compute jobs, complexity of scenes and sequences, number of hours invested in an asset, and so on. All that data goes into a big data pipeline to help DreamWorks perform predictive analytics using AIOps — an emerging trend in which AI and machine learning are used to automate the monitoring and mitigating of operational issues — powered by NetApp Active IQ.

“It’s a very complex, very dynamic environment,” says Jeff Wike, CTO of DreamWorks. “When the infrastructure or tools don’t work right or don’t work well, it has a direct impact on the ability of our business to perform. There’s a direct correlation between the technology and our ability to make films.”

From feeling to fact

DreamWorks is no stranger to business continuity planning. With its headquarters in Glendale, Calif., earthquakes and wildfires are a fact of life.

“You think about those things,” Wike says. “You don’t necessarily think about pandemics. We do now. But we’ve always been looking at how do we distribute our data? How do we distribute our compute processing? How do we make it so that if something happens, people can continue to work?”

Data management played a key role in getting DreamWorks ready. When DreamWorks was founded 25 years ago, each artist had a data set on their individual workstation. They would produce their jobs and then the data would move to the next artist in the workflow. Data was highly siloed, until the company moved to high-performance, shared storage clusters.

To increase agility and to support artist collaboration, DreamWorks adopted a multi-tenant cloud environment and virtual desktops to make each artist’s workstation and workflow accessible anywhere within the studio. As the environment grew more complex, the importance of monitoring grew. A few years ago, IT undertook a major rearchitecture of the studio that included instrumenting all its code. It was no longer acceptable to say things like, “I think the network is slow today.” Monitoring had to show precisely what was happening in the environment at all times.

“We called it moving from feeling to fact,” Wike says.

Analytics and automation were just as important as the monitoring itself. Like other manufacturers looking to perform predictive maintenance on their equipment before production grinds to a halt due to unplanned downtime, DreamWorks, for instance, needs to immediately notice that a particular file service or file end point is experiencing high latency so engineers can work on the application or change how it accesses the data before it affects end user experience.

“The goal for us is to optimize operations so that we can free up the engineers to do the hard stuff,” Miller says. “I want my engineers inventing the future, not monitoring a network or storage system.”

The importance of AIOps

That’s where AIOps comes in. DreamWorks uses NetApp to run synthetic transactions that replicate artists’ workflows to establish a baseline and then machine learning algorithms look for anomalies and provide alerts. For instance, Wike says, if the crowd department decides they need 150,000 people animated in a crowd scene and they want to render it all at one time, that could cause a big hit to performance. IT’s job, then, is to accommodate those needs and make changes in the production environment to keep performance steady.

“We don’t want the artists noticing that something’s performance has changed,” Miller says. “We want our synthetic transaction and monitoring framework to tell us before the artists notice that something is trending in a bad direction.”

“It used to be there would be an issue and maybe an engineer noticed because they were looking for it, or maybe the system sent an alert and an engineer would go investigate it,” Miller adds. “Now an issue surfaces almost always with a recommendation and, in many cases, a solution before the engineer is in the loop. It lets us run with 24×7 support with fewer sets of eyeballs staring at the systems.”

Building for continuity, collaboration, monitoring and analytics, all combined to allow the studio to nearly seamlessly transition to a work-from-home environment when it became necessary. Aside from a few workflows with high dependencies to on-campus studio resources, Miller says almost everyone has been able to transition to working from their homes as if they were in the office.

“Analytics has really allowed us to tune our environment almost overnight from everybody working next to each other to everybody being distributed without really losing a beat,” Wike says. “We were up and running within a couple of days. Our films are all on track.”