By David Andrzejek, head of Financial Services, DataStax
Capital One might be the sixth-largest bank in the United States, but it’s working hard to harness its data and the cloud to execute much more like a fintech. The company is on a mission to revolutionize the banking industry through technology and data and serves as a model for harnessing the power of data for growth.
Today, Capital One is a tech-forward financial services enterprise, employing open-source cloud technologies like the highly scalable NoSQL database Apache Cassandra® to improve customer experience, drive innovation, and accelerate speed-to-market for their applications. We recently spoke with Capital One senior director David Harmony about moving to the cloud, building a customer data platform, and the importance of real-time data.
Tell us about Capital One and your role
Capital One is one of the nation’s largest banks that offers traditional banking products, as well as online banking services. We offer auto loans and credit cards. Beyond that, we have commercial lending, as well as near-term services, like Capital One Shopping.
A few years ago, the leadership realized that the banking industry is going to be dominated by great tech companies that manage risk exceptionally well. Risk management was always one of the core foundations of the company. So they set out to improve our application development and dev ops practices. Around 2015, Capital One went to AWS re:Invent and set forth our aspirational goal to modernize our entire technology infrastructure.
Basically, we wanted to get out of our data centers and run in a public cloud. One of the core components I worked on was the customer platform. It was such a big move for us. There was so much change associated with moving to the cloud.
I joined Capital One 10 years ago, at the cusp of its digital transformation. Throughout the years, I was super lucky to work with great teams on challenging projects. When I initially joined, I worked on creating the API models that would support the applications we run today. We’ve worked with the application teams to build out the APIs for our existing mobile application available on the app store. I was really proud of how much work we did. I learned a lot from the ecosystem.
After that project we moved our digital services – as part of the migration – into AWS. Then, I went over to work on our customer platform, one of our primary systems where we migrated the customer system off the mainframe and transferred it into the cloud. This customer data platform initiative had a lot of engagement with DataStax [a managed database service built on Cassandra].
What challenges did you face with the customer data platform and how did moving to the cloud help?
Whether you log into the website, the mobile device, or interact with an agent, the customer system is queried to determine your relationship with the bank and how you want to interact with us. We persist that information to give you the right service.
The Capital One customer data platform used to run on a centralized relational database management system (RDBMS) model that could only release, at most, four new features a month. This caused delays in resolving issues that application teams were having with the platform, as well as the company’s efforts to introduce more seamless features to the market.
Capital One also had difficulty in scaling up its outdated infrastructure. On-premise capacity planning was a massive project. The cost and lead times of scaling the capacity of the mainframe hindered application upgrades and slowed their ability to bring new features to market, making technology a barrier for business features. During holiday seasons, the company had to scramble to ensure there was sufficient capacity to meet spikes in demand.
Capital One adopted a microservice architectural style, which consequently pulled a bunch of data out of central locations and separated it into different parts of the customer application ecosystem. Now, the components we previously ran on our mainframe are now running on DataStax. We adopted this architecture to help us mitigate risk of failures, generate clear lines of separation to scale independently, and, most importantly, enable teams to build and deploy our applications independently.
Now, we can easily do a hundred releases a month for some of our components. This allows us to get more features to market at a faster rate with less. We still have third-party vendors that rely on mainframes, but all our internal applications are off the mainframe and completely running inside of AWS and on top of Cassandra. The cloud has given us the capability to release features much faster and scale out easily, changing the way we operate.
Why did Capital One choose Cassandra for the customer data platform?
There are a few things that come to mind. The access patterns we need for the customer platform are pretty straightforward and fit perfectly with the key value model of Cassandra. We also make good use of Cassandra’s wide column implementation to add new attributes to our customer data and append them into the existing structure.
One of the bigger advantages of Cassandra is resiliency. Since Cassandra leans towards AP in CAP Theorem, it can manage partition failures to remain available round-the-clock. Cassandra’s masterless, peer-to-peer architecture ensures that applications never experience downtime even during disastrous system failures.
The company itself has invested a lot of time and effort into our resiliency and this commitment made Cassandra a great choice. It’s always available. It’s always there for us. And it has performed rock solid.
How do you measure the data platform’s ROI and what are the results you’re seeing with Cassandra?
When we talk about ROI, there are three primary things to consider: opportunity costs, operational costs, and customer experience.
The investment in Cassandra may be large, but there’s also going to be some lost opportunity costs staying where you are. On the mainframe, it was really difficult. We had constraints on what we could implement from the business feature perspective, because the mainframe investment hurdle was so high. Now we’re able to scale our platform easily enough to bring new features to the market round-the-clock with enough capacity.
Secondly, from an operational cost standpoint: as a bank we acquire portfolios of big companies like Walmart and bring them into our ecosystem. Typically, these portfolio migrations took multiple weeks or even months. With Cassandra, we can do this over a weekend without any downtime. It’s reached a point where adding 15 million new customers is now a standard day-to-day operation.
Lastly, because of the great real-time insights we’ve gained from the modern architecture, we were able to identify gaps in processes and technology components and compensate for them, driving down the amount of times that people contact customer service. Ultimately, our investment created a better customer experience for the long-run and improved our cost-profile.
Specifically for our customer data platform, there are two metrics that we’ve actively tracked: recovery point and recovery time objectives. The recovery point objective is the ability to isolate from a single level of failure and avoid issues while the recovery time objective is to make sure that no data loss is persistent.
Previously, our RDBMS implementations had a tough time meeting our recovery point objectives, which are typically less than five minutes for a regional failure. Additionally, with those implementations being active, passive and not multi-master based, we experienced additional latency. This made us question the value of running two systems if we always have to write back to a single region. Now I’m really proud of the teams and the uptime they have achieved. We aspire to five-nines of availability and we are often meeting our existing SLAs. Our customer team has also taken on a great level of ownership of the platform, which is super awesome.
Within the customer platform, the vast majority of our traffic that goes to Cassandra is real-time. Adding Apache Spark [an open source data analytics engine] into the Cassandra ecosystem helps us validate that our data is consistent across the ecosystem and gain additional insights into service and system gaps. We’ve now built a real-time data center and an analytical data center to support all our banking systems, including additional machine learning models.
Migrating functions off the mainframe is a notoriously challenging operation. How did you cope with this change?
Moving to the cloud can be a very scary conversation. There’s a risk to making almost any change and you need to be thoughtful and careful to avoid making the wrong choices. The biggest thing we did was data testing. It was a significant level of overhead for us, but we were able to migrate our customers safely. It’s this level of data testing that made our migration to DataStax very successful.
Another important thing is to put a lot of thought into your data model, especially within Cassandra. Think hard about your data models and make sure that you feel good about them. Also, there’s no perfect system and you need to be prepared for failures. Try to understand beforehand how you’re going to compensate for them and how you will correct them when the failures do arrive.
Last but not least, you absolutely have to invest in your people on the teams. They’re very talented and they’re the ones who will drive innovation in your application ecosystem.
With DataStax 100% invested in where we are, and with our solid relationship within Cassandra, I feel like we are in a good place. I’ve been super pleased with the performance and availability that is now provided on our platforms.
Listen to the full conversation with David Harmony to learn more about how DataStax helps Capital One leverage the seamless scalability of Cassandra to drive faster innovation and improve customer experience.
About David Andrzejek:
David has spent 25 years helping companies adopt technology to achieve outsized business transformation results.