If there’s one key component for AI success, it’s data. But even organizations steeped in data and well-versed in the use of analytics can struggle to establish reliable, automated data pipelines to fuel machine learning strategies.
Discover Financial Services found itself at this crossroads in 2019, when its developers and data engineers were coping with complex manual processes that devoured time and hampered the company’s agility. To fully leverage machine learning and real-time data insights, Discover needed to transform how it acquired, enriched, and used its data. Its answer? The Cloud Data Fabric, a homegrown platform that weaves together a variety of services to provide metadata-driven automation, real-time ingestion/loading, and built-in governance in the cloud.
“We assembled our best technical leaders to think through the problem, lay down some initial must-haves, and created architectural ideas on how we could meet our goals,” says Amir Arooni, executive vice president and CIO of Discover Financial Services. “We would take the ideas and go on a tour to various engineering product squads or leaders to seek feedback and adjust along the way.”
Up to then, Discover’s process for building data pipelines involved lengthy conversations between application developers and engineers to decide which data to send to analytics. Developers would then manually code scripts to extract data from operational databases and schedule those scripts to send the raw data to an analytical environment landing zone. Data engineers would then build specialized data applications to accept the raw data files to perform various actions such as validating the schema. Data engineers also had to capture data sensitivity information so they could program logic to tokenize the correct fields.
Ultimately, this meant data engineers spent hours manually coding logic and figuring out where to send the analytical data and the right formats in which to store it.
Arooni and his team huddled with Discover’s security and file transmissions teams, its cloud infrastructure groups, its DBA and data governance teams, and its data engineers and scientists on its ideas for rectifying this situation. The resulting Cloud Data Fabric, for which Discover Financial Services recently won a CIO 100 Award in IT Excellence, knits together services that stream data from Discover’s operational application databases, capture metadata, tokenize sensitive data fields, and track dataset lineage.
“The products in the fabric strive to improve data engineering efficiency through metadata-driven automation and frictionless user experiences,” Arooni says. “For example, our fabric consists of products that handle our ingestion of data at much faster speeds, and we can direct data to multiple destinations in real-time with a few button presses.”
The runway to success
The project was not without challenges, the biggest of which was ensuring that everyone at all levels was aligned on goals and vision, Arooni says.
“It took many rounds of communication from our engineers to first-line management up to senior leadership to get everyone aligned … all the time,” he says. “We are a big organization, and there are lots of thoughts, opinions, and varying degrees of understanding. You must honor all these aspects in your listening, goal-setting, and problem-solving to lead while executing.”
To make that work, Discover introduced a new initiative called “The Runway,” which consists of five work streams: engineering workforce, extreme automation, agile practices, reliability and technology organization, and discipline and employee experience. The Runway brings together smaller, self-empowered engineering teams that focused on developing a single agile approach and automating manual functions with an emphasis on simplification.
Forming autonomous teams with the ability to implement technologies used for the project took some time, Arooni says. It required communication and the establishment of trust with various technology owners, and the teams had to work cross-functionally with file transmission developers, DBAs, data management gurus, security experts, and various groups of full-stack developers. Architects, product managers, Scrum masters, and management teams coordinated their efforts. A number of engineers had to learn how to develop on cloud software for the first time.
“As part of trying to create more autonomous teams, we mixed skillsets to product squads,” Arooni says. “This means everyone can participate in technology they might not have had a chance to previously due to artificial ownership barriers.”
With hindsight, Arooni says he would have made a bigger push for more autonomous teams at the beginning to establish efficiency and morale gains sooner. He says the team also would have organized its architecture design model a little differently.
“Both things we are now doing for our newer product developments,” he says.
Arooni says the Cloud Data Fabric’s impact on Discover has been invaluable. The project has reduced the engineering development and support time of its data pipelines, and its data scientists, AI/ML engineers, and modelers can get more meaningful data at faster speeds. The project has led to millions of dollars in savings and cost avoidance in retroactive data.
“The time, capacity, and finances saved due to this innovation initiative is a massive win for Discover,” Arooni says.
He also notes that Discover’s engineers have developed “a ton of ideas” for further enhancements throughout the development and deployment of the Cloud Data Fabric.
To his peers, Arooni says: “Automate as much as possible; user experience is paramount; and love your engineers through empowerment.”