by Thor Olavsrud

Fidelity unlocks the power of data by transforming its pipeline

Sep 10, 20218 mins
Data Management

Fidelity Investments is creating a 'next-generation data pipeline' to make data available throughout the enterprise and its ecosystem.

big data analytics analysis thinkstock 673266772 100749739 orig
Credit: Huawei

Nearly four years ago, Mihir Shah had a vision for fundamentally changing Fidelity Investment’s data strategy. Then CTO, Shah pushed to become the company’s first ever enterprise head of data architecture and engineering to enact that vision, what he calls a “next-generation data pipeline.”

At the center of that vision was one key tenet: “No matter what your role is, we want to make data available to you to make your job easier and to make your decision-making better,” he says.

[ Learn the essential skills and traits of elite data scientists and the secrets of highly successful data analytics teams. | Prove your data science chops by 12 data science certifications that will pay off. | Get the insights by signing up for our newsletters. ]

Becoming a data-driven company has been a key goal of most organizations for years now, but many have failed to truly drive decisions with data at speed for a variety of reasons. For Shah, transforming the data operating model is the foundation of success.

“Everybody knows that data is valuable, and that integrated data is more valuable than siloed data,” he says. “We’ve been talking about this for years, but nobody has been willing to make those operating model changes to actually make that happen. The technology is there. It’s getting better. But I think the key secret sauce is the operating model.”

From silos to neighborhoods

Boston-based multinational financial services firm Fidelity Investments is one of the world’s largest asset managers. In 2014, Abigail Johnson succeeded her father to become CEO of the company, which her grandfather founded in 1946. She brought with her an organizational structure based on the idea of neighborhoods, where related business units were clustered together with a senior executive responsible for that “neighborhood.”

Mihir Shah, enterprise head of data architecture, Fidelity

Mihir Shah, enterprise head of data architecture, Fidelity

“Data is horizontal in nature,” Shah says. “There are no boundaries to data. Why not use the same construct in some of the technology areas?”

The result was a set of new IT “neighborhoods”: cloud, data, cyber, and APIs. Shah took charge of the data neighborhood.

Shah’s vision was to change the way Fidelity managed data and made it available to its 41,000-plus employees. In 1994, when Shah first joined Fidelity as a principle technical advisor, the company was relatively advanced in its use of data. For example, it had a marketing database with customer profiles and customer modeling. But the data wasn’t integrated across its businesses.

“The siloed approach has multiple side effects,” Shah says. “For somebody to leverage a data asset, you have to gather those data sets from everywhere for every new use case. The time-to-market and the friction that comes when you have siloed data is very high.”

A simple data set in a traditional data architecture might include an account with positions, balances, and transactions, Shah says by way of example. The marketing department wants that data to calculate total assets and customer behavior. The risk department wants that same data for calculating fraudulent activities and KYC (know your customer). The finance department needs that same data to create cost units and to understand the profitability of the customer.

“We had a marketing data warehouse, we had a risk data warehouse, we had a finance data warehouse, all operating separately, all with feeds going from one to the other,” Shah says. “They all worked fine, but at the back end the cost of actually managing all that was tremendous. We could be spending that money on more value-added activities than just moving data around.”

Shah says he asked the CIO, “Do you know how many people in your database team are chartered with just moving data from A to B? Also, when you move data, you create all kinds of problems with bugs and quality issues. How many people are engaged with fixing those problems because you’re moving data constantly?”

Instead, Shah sought to use new technologies and the scalability of the cloud to create a single data warehouse that made data independent of its users.

Transforming the data pipeline

Shah’s transformation vision hinged on three steps. First, Fidelity had to migrate its data into that single cloud data warehouse and create processes by which new data would feed into the data warehouse as it was created. Second, it needed to make the warehouse easy to access by everyone in the company. Third, everything had to be secure within the boundaries of Fidelity’s ethical, privacy, and contractual standards.

For organizations implementing similar data lake-like concepts, a major stumbling block had been stakeholders balking at giving up ownership of their data.

“We had to rethink how we actually create a business model where people don’t have to give up what they own, yet we actually are able to integrate everything,” says Shah, whose team handles the engineering of the platform’s cloud database, ingestion pipelines, monitoring, and data catalogs.

Procedures also had to be put in place for getting newly created data into the data lake, a process that must be highly coordinated given Fidelity’s taxonomy, which includes 15 data categories at the top level and 3,000 categories at the lowest level. Shah’s team provides the sequence for seeding data into the platform and orchestrates the process, but the data owners execute the process.

“The data is best owned and managed by the people who currently manage and own it,” Shah says. “We said that each data owner needs to push data into the platform and seed the platform but continue to own it.”

Getting people to use the platform, build products, and migrate their existing applications, dashboards, reporting, and so on has been another key element of Shah’s strategy.

“People will gravitate toward the best source of data, the clean source of data, data that is managed, data where they can talk to someone if they need to,” Shah says. “Now it’s almost a network effect. Everybody knows this is the best place to get data. They come on board, they find a couple of datasets missing, so they go to their business partners and say, ‘I need this data set.’ So, more data comes in, the usage goes up, and the more the usage goes up, the more there’s a demand for more data sets.”

Fidelity, which worked closely with customer data platform vendor Simon Data, started implementing the new strategy in January 2020. By June 2020, it was in production. Getting engineers throughout Fidelity excited about the vision was key to selling the rest of the company on the transformation, Shah says.

“If you get the engineers onboard and get them excited, they convince the managers and business partners,” he adds. “They do the selling up in their vertical business units and they are trusted within the business units. Even if other people don’t believe in me, they will believe in their own engineer.”

Today, there are nearly two petabytes of data in the platform. Meanwhile, Shah says his perspective and focus has shifted as the true power of the strategy has become apparent.

“If we can do this inside of Fidelity, why not expand the data architecture to our ecosystem?” he says. “Now, when we think about data architecture, we are thinking not just about Fidelity business units; we’re thinking about the entire ecosystem and saying, ‘How do we make data an asset, not just for us but also for our institutional customers? How can we make our suppliers’ lives easier by not having them send us a hundred thousand FTP files every night?’ Your thinking about a data architecture should not be limited to your business units. I think everybody knew that it should be the entire enterprise, but I think you need to think beyond the enterprise to your entire domain or ecosystem.”

More on data analytics: