Discover transforms data pipeline for AI success

Seeking to streamline the processes required to fully leverage machine learning and real-time data insights, Discover Financial Services has created a cloud-native data fabric that automates much of the work.

Discover transforms data pipeline for AI success
Discover Financial Services

If there’s one key component for AI success, it’s data. But even organizations steeped in data and well-versed in the use of analytics can struggle to establish reliable, automated data pipelines to fuel machine learning strategies.

Discover Financial Services found itself at this crossroads in 2019, when its developers and data engineers were coping with complex manual processes that devoured time and hampered the company’s agility. To fully leverage machine learning and real-time data insights, Discover needed to transform how it acquired, enriched, and used its data. Its answer? The Cloud Data Fabric, a homegrown platform that weaves together a variety of services to provide metadata-driven automation, real-time ingestion/loading, and built-in governance in the cloud.

“We assembled our best technical leaders to think through the problem, lay down some initial must-haves, and created architectural ideas on how we could meet our goals,” says Amir Arooni, executive vice president and CIO of Discover Financial Services. “We would take the ideas and go on a tour to various engineering product squads or leaders to seek feedback and adjust along the way.”

Up to then, Discover’s process for building data pipelines involved lengthy conversations between application developers and engineers to decide which data to send to analytics. Developers would then manually code scripts to extract data from operational databases and schedule those scripts to send the raw data to an analytical environment landing zone. Data engineers would then build specialized data applications to accept the raw data files to perform various actions such as validating the schema. Data engineers also had to capture data sensitivity information so they could program logic to tokenize the correct fields.

Ultimately, this meant data engineers spent hours manually coding logic and figuring out where to send the analytical data and the right formats in which to store it.

To continue reading this article register now

Download CIO's Roadmap Report: 5G in the Enterprise