Ebates looks to cloud data lake to resolve ETL dilemma

Several years ago, an on-premises data lake was the answer to Ebates' BI infrastructure woes. Today, spikes in demand from ad hoc queries are interfering with core ETL workloads.

Ebates looks to cloud data lake to resolve ETL dilemma
Thinkstock

Become An Insider

Sign up now and get FREE access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content. Learn more.

One of the ways companies often run into trouble with data lakes is trying to use them as a data warehouse. It's a "terrible idea, unless it works," says Merv Adrian, research vice president at Gartner.

It's an issue that Mark Stange-Tregear, vice president of analytics at Ebates, knows all too well. When Stange-Tregear joined Ebates a little over four years ago, the company didn't have much of a business intelligence (BI) infrastructure beyond a single SQL server and a handful of data engineers taking a replica of the main production database. They were struggling with the extract, transform, and load (ETL) process.

"ETL cycles were running 28 hours. Team members couldn't get the reports or information they needed on a regular basis. We were hitting concurrency limits. It was clearly becoming unstable," Stange-Tregear says.

A data lake built on a Hadoop cluster looked like the right solution, both from a cost standpoint and Ebates' vision for the future. The company would be able to land all its data in one place and make it available without having to reblend it and handle multiple silos.

To continue reading this article register now

NEW! Download the Fall 2018 digital issue of CIO