With its on-premises analytics infrastructure hitting capacity, Cleveland, Ohio-based KeyBank has turned to the cloud, a move the large regional bank believes will provide clear performance benefits and likely cost savings but one that will require rethinking how the company trains and manages its users.
The bank processes about 4 billion records each night. Data is loaded into a Hadoop data lake and is then pushed down to more than 40 downstream systems, including 10 to 12 data marts used by Teradata. “It’s a conventional on-prem architecture that would be current today,” says Mike Onders, chief data officer, divisional CIO, and head of enterprise architecture at KeyBank. “We have over a petabyte of data in the Hadoop data lake environment and over 30 terabytes in the Teradata environment.”
The system, which serves 400 SAS and Teradata users and 4,000 Tableau users, works well, but a little more than a year ago KeyBank’s Teradata appliances started reaching capacity.
“The engineered hardware itself still does what it was supposed to do: high-performance analytics,” Onders says. “But in an on-prem architecture, you govern capacity. You’re holding capacity steady and so performance will vary based on the loads on the box.” For KeyBank this meant performance and queuing issues when running month-end and quarter-end jobs.
Moreover, Onders’ team projected that KeyBank would need to refresh its Teradata environment in 2021 — an inevitability KeyBank wanted to avoid. That’s when Onders and his team decided to explore whether moving the bank’s analytics to the cloud would be a better choice.
To the cloud
In late 2018, Onders’ team launched a proof of concept (PoC) with cloud data platform Snowflake, followed by a PoC with Google Cloud Platform in early 2019. While Onders concedes Snowflake had a slight edge in performance, the promise of a single vendor architecture for managing ETL, visualization, data storage, data access, and machine learning made Google the right choice for KeyBank.
The bank now has five data marts in various testing stages within the Google ecosystem and Onders’ team is seeing three to four times faster query performance over the bank’s on-premises queries. But Onders and Doug Kanouff, senior vice president and director of enterprise architecture and enterprise data and information services at KeyBank, both note that full production load will be the real test.
“We’ve canvassed a number of our marts and a number of our users to get indicative queries that they’re executing,” Kanouff says. “We are running those live. So, we’re able to use real-world data, real-world volumes to those comparison queries. So far it looks pretty good. But once we get the full production load, we get batch running, we get end users querying, that mix is going to look different and we’re going to need to react and drill into what those volumes are to make sure that the environment is performing as we need it.”
Training to fine-tune costs
The biggest challenge in making the shift may just be a business process and cultural one. Google Cloud is promising because it offers virtually limitless capacity. But it also means shifting from a fixed-cost model with variable performance to a capacity model with variable cost.
“That’s a shift that we need to very carefully manage and oversee because I don’t think our senior leaders in our finance teams want a true variable cost model,” Onders says. “They want to be able to predict how much we’re going to spend next month and the month after that.”
With Google BigQuery, you pay per query and the cost varies based on how much data the query needs to access. In an on-premises Teradata or Hadoop environment, if a user runs a bad test query on three years of transaction data that could have used just 30 days of transaction data, it won’t change the cost. It will consume a lot of horsepower and other users might experience performance issues while the query runs, but that’s it. With BigQuery, it won’t affect other users’ performance, but it will cost money.
“In a Google environment we have to do much more surveillance and monitoring and training to ensure people are not doing bad stuff that costs money when they could be doing it differently,” Onders says.
Many users are also going to have to be trained to work differently. SAS users, especially, find the data they need, copy it, and load it to their analytic workspaces. But Google (and most cloud providers) charge for data egress. In an on-premises environment, copying those data sets doesn’t add costs, though it does create data consistency and governance issues. For the move to Google Cloud to be successful, those users must be trained to bring their analytics to the data.
“As we go into Google, we’re going to invest a lot more in what I would call ‘data academy,'” Onders says. “Not just a data encyclopedia, but training people, certifying them, giving them lab questions to answer, giving them a sticker on their laptop to say that you know how to access our client analytics mart or our transaction mart or our risk mart. You’ve been certified and we’ve taught you better access paths, because I don’t want to propagate the same pattern into Google Cloud that they’re used to from a mainframe SAS architecture.”
Kanouff adds, “The biggest cost consideration is the query execution, and we need to really shift our thinking and focus on inspecting who is doing what, who is querying what, and how do we optimize those queries.”
The cost of data egress is also something Onders is thinking a lot about as he contemplates moving his data lake to Google as well. His vision is a single platform architecture in the cloud, but the cost of data egress is proving a sticking point.
“There’s not a lot of data egress from Teradata because it’s more of a target landing zone for marts and analytics, and we would move our analytics tools into Google Cloud so there’s not a lot of data egress,” Onders says. “But when we move the data lake to the cloud, we do send data to 40-some downstream systems. That’s going to be a bigger issue for use that we’re still having conversations around. It’s a model you’ve got to get your head wrapped around and figure out how much it’s going to cost us.”