Powering the World’s Leading Discovery Platform Engine with AI and HPC

data center servers
Dell Technologies

While you may not be familiar with Taboola, chances are you’ve interacted with us many times in your online travels. Taboola is the world’s leading discovery platform. Every day, we provide 30 billion content recommendations across four billion web pages, processing up to 150,000 requests per second. As we like to say, in Taboola’s world the content finds you — not the other way around.  

So how do we achieve these amazing numbers? In a few words, we use a unique combination of artificial intelligence (AI) and high performance computing (HPC).

Let’s take a look under the hood.

Front-end systems

On the front end, we use HPC systems for AI inferencing. These systems process and deliver the real-time content recommendations to generate the desired clicks, views and conversions. Each request coming into a front-end data center runs our AI-driven inferencing algorithms in a unique, ultra-fast process that delivers a relevant recommendation within 50 milliseconds.

This front-end environment is based on Dell EMC PowerEdge modular servers with Intel® Xeon® Scalable processors. These systems run our sophisticated custom-built inferencing algorithms based on an open-source TensorFlow machine intelligence framework. We also leverage a Kubernetes Docker container environment that streamlines application development and deployment, and enhances the efficiency of our IT team, which manages our global network of HPC systems.

With this solution we now get up to six times the performance on our AI-based inferencing, compared to when we started. This helps reduce our costs today, and we believe there’s a lot more performance improvements to be gained over time.

Back-end systems

On the back end, we use HPC systems that host cutting-edge deep learning models, which are continually trained using intelligent neural networks to infer user preferences. We run this AI engine with Dell EMC PowerEdge R740xd servers, which provide the performance to access our massive data to train our models and push them back to our front-end data centers for inferencing.

To further boost our response times,  we have created high-performance computing clusters across our data center to take advantage of additional computing power. Rather than just adding servers or racks, we look at everything as a single HPC cluster. This delivers significant performance improvements and greater cost efficiencies.

Hands-off management

So how do we manage a global network with more than 10,000 nodes? This work requires an efficient, highly scalable management system that needs a minimal amount of manual intervention. We found our answer to this need in the form of the Integrated Dell Remote Access Controller (iDRAC).

iDRAC enables our 12 Site Reliability Engineers (SRE) to remotely deploy, update and monitor servers across our nine global data centers. With iDRAC and its hands-off system administration capabilities, our IT staff achieved a 99 percent reduction in administrator attended deployment time. We know that without a tool like this, it would be impossible for a dozen engineers to manage thousands of nodes spread around the world.

iDRAC also enables streaming of key monitoring metrics for advanced analytics, to better understand multiple events being monitored, as well as to proactively alert the team to potential problems. These metrics include critical information about how the hardware is configured and performing, with data ranging from CPU errors and memory or power issues to server operating temperatures. And it’s all automatically monitored, managed, updated and remediated in the background, in an agent-free manner.

Ongoing optimizations

For Taboola, hardware and software deployments are never a “one and done” undertaking. Optimization of our IT environment is always a continual process.

To that end, we now emphasize rack awareness in our logistics — how much density and bandwidth we have in each rack in our data centers. Rack awareness allows us to understand where the various compute units are and what different nodes within a data center cluster are running. Rather than just adding servers or racks, we look at everything as a single HPC machine, and reshuffle servers to achieve significant performance improvements and greater cost efficiencies.

Ultimately, at Taboola we believe there is much more to be gained by further upgrading the utilization of our AI and HPC platform — leading to continuing processing and software improvements.

Ariel Pisetzky is vice president of information technology and cybersecurity at Taboola.

To learn more

For a closer look at the solutions and services from Taboola, and to learn how easy it is to create a campaign to reach your customers on websites they trust, visit us at Taboola.com and our engineering blog.

To learn more about Taboola’s AI journey:

Read about how you can accelerate your AI Journey with Intel.

Copyright © 2020 IDG Communications, Inc.