by John Gallant

Why data quality is key to successful digital transformation

Apr 18, 2017
Big DataDatabase AdministrationInnovation

Informatica CEO Anil Chakravarthy says mastering data integration and integrity are critical to both innovation efforts and securing the enterprise.

digital transformation
Credit: Thinkstock

If you want to be successful in your digital transformation initiatives, clean up your data first. That’s the message from Anil Chakravarthy, CEO of Informatica, whose master data management products help companies get a 360-degree view of customers, suppliers and other key assets. In this installment of the IDG CEO Interview Series, Chakravarthy spoke with Chief Content Officer John Gallant about the data quality and integration issues that hamstring innovation and digital transformation efforts. He also discussed how the nearly 25-year-old company’s decision to go private in 2015 was spurred by its own digital transformation strategy.

[ Related: Tech Titans Talk: The IDG Enterprise Interview Series ]

Chakravarthy has a long background in the security industry, having held senior roles at Symantec and VeriSign, and he also explained why our security focus needs to shift from the perimeter to the data layer. Until we do, security will remain, in his words, ‘an unsolved problem.’ What problems do you solve for customers?

Informatica CEO Anil Chakravarthy: Informatica delivers what we call enterprise cloud data management, which means we provide data management products and services for companies going through digital transformation. Everybody is focused on some kind of digital transformation [effort] and virtually every digital transformation is driven by some kind of data-driven change.

anil3 Informatica

We help our customers build the data foundation for digital transformation. We build what we call an Intelligent Cloud Data Platform. The Intelligent Cloud Data Platform is hybrid — both on-premises and in the cloud — and it helps customers build a variety of digital transformation applications and processes. Recently, I chaired a roundtable on digital transformation with CIOs and their functional counterparts from around the company. A big part of that discussion was about becoming analytics-led, or data-led, companies, but everyone in the room recognized that one of the challenges to that is problems with data. What are the issues companies have to deal with to truly become analytics-driven companies?

Chakravarthy: I think it goes to the heart of how you do a digital transformation. First, it needs to be led by the CEO and the board because this is going to impact across the company. It probably means some change in the business model. For example, if you look at the digital transformation that big companies like GE are going through, they’re changing their business model from selling very high-end equipment and a service plan to a new business model where, based on real-time data, they will share risk with their customer. They will offer their high-end equipment as a service. That’s a change in the business model.

The second big change that we see is new processes. For instance, companies are now providing a lot more capabilities as some form of service to their customers, whether it’s mobile banking or mobile self-service, etc. All of that means new processes.

The third big area we see for digital transformation is new users or new applications to support. In the past, for example, analytics applications were primarily for reporting or for after-the-fact analysis. For instance, you might close a quarter. Now, you want to do the analysis on how your quarter went, which segments did well, which segments did not do well and so on. It was after the fact and done in a batch manner. Now, the implication for these types of new users and new processes is you need the analytics to be built in to the flow of how they do their work. That is a big change.

The data requirements are that, first, you must be able to access data from your existing systems as well as new systems. You must be able to access data from any source whether, it’s on premise, in the cloud or on new data platforms like Hadoop. You also must be able to make sense of that data. You might have customer data coming from multiple sources and, if you want to get a full perspective, you must be able to put it together using metadata.

The last big thing is that you need to be able to do all this with clear governance in mind. That cannot be an afterthought. In this new world, that data governance must be built in from the get-go. In talking with this group of CIOs, people were citing all kinds of issues with data quality, data standardization. Are these the kinds of things that you help people with?

Chakravarthy: First, we help people get data together from multiple sources into a common data repository that they can then use for data analysis and reporting and to support processes. Consider the number of data sources that are available, from the mainframe age to the internet of things [IoT] and everything in between, there is a huge variety of data sources, lots of different types of databases, lots of APIs; in many cases, no APIs because they were built some time ago, etc. Ingesting that data in a manner that doesn’t impact the source systems, doesn’t impact any ongoing operational processes but still gets you the data at the right time to the right users; that is the first problem that we solve. That’s the data ingestion problem.

Second, now assume you’ve solved this problem with getting the raw data in its own format, that’s where the first step of these quality problems that you talked about start. You may not have the same customer ID in different databases. You may have a customer ID in one place and you may not have any ID at all in a different place. You may have the feeds but the feeds may be incomplete or incorrect. There is a whole host of problems because many of these databases and data sources were not built with the assumption that the data would be taken off those systems and repurposed. That’s the second set of big problems that we solve. Data quality is a broad topic, but the use of metadata or data quality products that can have prebuilt rules or use new technologies like machine learning to figure out the rules.

The next set of problems related to data is how do I make sense of the data related to a specific business entity? As an example, if you are an oil company, you care about, let’s say, an oil pipeline and you want data from a variety of systems. Some might be ERP-type systems, others might be new IoT systems that are providing real-time feeds. You want to be able to get all of that and have a 360-degree view of an oil pipeline. Is it working well? Is it meeting its targets? Is it profitable? That business entity view of data is the third big problem that we solve, putting the data into a format so you can have a 360-degree view of the data. The last one is the governance of the data. Where did this data come from? Do we have the lineage on that data? Is that the authoritative source of data? Is the data secure? All of that we put under the rubric of governance. That’s the fourth big problem that we solve for data. What pain point encourages a customer to say: I’ve got to start talking to these folks from Informatica?

Chakravarthy: Usually, it’s when they start one of these big [digital transformation] initiatives. Then they realize that either they have incorrect data or incomplete data or they have a problem getting the right data in the right place and doing the right transformation on the data so it is usable. Those are the typical problems that we find that they call us in for. I want to go into some depth on your key products. Let’s start with the Customer 360 product.

Chakravarthy: The Customer 360 product is the easiest product to pitch because it does what it says. Nordstrom is an example of a customer that’s been using this product at least since 2008. You walk into Nordstrom and the associate has some type of tablet device and they want to pull up your entire profile as you walk in. Then they have a number of things built into the profile where they personalize the service for you. You might be a shopper at, at Nordstrom stores, at multiple Nordstrom stores, who knows? You might be a shopper at Nordstrom Rack as well. You may have called in customer service, you may have returned some products online, etc. [That data] comes from a number of applications and databases and that’s what Customer 360 does. We pull that together and then the customer or one of our partners builds custom applications on top of that data. We don’t build the apps. We pull it together into a foundation that makes it easy to build the app.

In the case of Nordstrom, for example, it may be a personalization app. This customer is one of our most profitable customers. They come once every couple of months but then once they decide to buy, they never return [anything]. Or, this is a person who makes a quick decision but they return half the things they buy. That helps provide personalized service. Those are the types of applications that are built on top of the data. We see this in every industry, this kind of customer. In the insurance industry, for example, MetLife is a great example. You were not really a customer to them; you were a policy number. But now they want to know, who is the customer, who else is in their household, what’s going on in their life? What are their life events? Are they getting married? Are they having children? Are they getting divorced, etc., deaths in the family? They want to have a picture of you as a household because that helps them get a better sense of what other products might be useful to you, how they can build a better relationship with you. That’s exactly what the Customer 360 product does. I assume Supplier 360 does the exact same thing for your business partners.

Chakravarthy: That’s correct. The most common things that we see across industries are Customer 360, Supplier 360, Product 360 and what we call Asset 360, which is what are your business assets, like that oil pipeline I talked about. How do you get a 360 view of that? But the product itself, which is what we call our master data management product, is very extensible. In fact, we now we have 140 specific domains of data.

The FDA is a good example of a customer. For the FDA, a drug is a domain and they want to have a 360-degree view of the drug. We have a number of healthcare and life sciences customers who do that. Whatever is the critical business entity for them they build a 360 view around that. The most common ones like Customer, Product, etc. we have prebuilt solutions that they can customize. What do you currently offer in the cloud and where are you taking that for the future?

Chakravarthy: Our cloud strategy is threefold. First, we are strong believers in building products for the cloud that are cloud-native. We have a lot of capabilities from our portfolio from the past. We’re a 35-year-old company. We take a lot of those capabilities but build cloud-native products. In other words, we don’t try to forklift an existing on-premise product into the cloud. We have cloud data integration products, cloud application and process integration products, master data management for the cloud as well as other new products that we released; cloud data quality and cloud data security, etc. That’s the first leg of our cloud strategy.

anil2 Informatica

The second part of our strategy is to have a hybrid strategy. A number of our on-premise products support clouds as either data sources or data targets. For instance, you may be using your analytics data warehouse on premise but you want to bring data from or Workday or any other cloud application you are using. That’s a very frequent use case we support. In fact, for salesforce-to-SAP integration, for example, we support over 40 percent of the Salesforce-to-enterprise integrations across the world. That’s an example of hybrid where the data is being brought in, where the cloud is being used as a data source. We also do a lot of work where the cloud is being used as a data target, the most common one being something like Amazon Redshift, Azure SQL Server. The data warehouse is in the cloud and we are taking on-premises data from source systems and doing the data integration, the data quality, governance, etc., and the data is then processed within Amazon Redshift or Azure SQL Server for the analytical applications.

The last piece of our cloud strategy is to make sure that we embed ourselves with the key ecosystems in the cloud. The four biggest ones are, Amazon, Azure and Tableau. We also are working on, and you will shortly see announcements of, several others as well. We want to be ecosystem-native, which means, for example, within Salesforce we have applications running on which are embedded within Salesforce. We’ve been working very closely with Amazon as part of their entire data layer and we would like to be called one of the Amazon all-in partners for doing that. They called us out at their re:Invent Conference. We provide an entire data management layer within Amazon. We’re doing the same thing now with Azure as well with the Azure data management layer. Can you talk about why the company went private and why that benefits customers? What’s in it for them?

Chakravarthy: We went private primarily to go through a transformation of our own just like customers are going through big digital transformations. Our transformation is from being primarily an on-premises, licensed software company to an enterprise cloud company, which means the first transformation is around the business model, from a license-based software company to a subscription software company. That [business] is growing extremely rapidly and being private, as you know, helps with that because it’s hard to do it in the public markets with the impact it has on the financial side of the company.

The transformation from a customer perspective is extremely beneficial. One of the things cloud companies have done really well, and which we are doing too, is they have approached everything from the viewpoint of customer success. Once you have a subscription business model, it means the customer could turn you off. If they have a better solution they could turn you off after a year or six months. It’s critical to keep adding value. A, make sure the customers adopt the software; B, that they are happy with the use of the software, it’s well integrated, it’s achieving their goals. As we transform our own company we are putting a lot of effort around customer success because that’s what drives the increase in subscription software as well as the renewals and the ongoing success of our company.

anil1a Informatica

From a customer perspective, this change is extremely beneficial because we’re very focused on that. The second reason it’s beneficial to our customers is that even as we went private we kept the same level of investment in R&D that we had as a public company. We spend 15 percent of our revenue on R&D and we kept that level of investment so that we could continue to innovate in these new areas. You have a long background in the security marketplace. Why did you move into the data management space?

Chakravarthy: Every company is starting to become a technology-centric company or a data-centric company. That’s a big opportunity for us. I could see the opportunity. That was one reason. The second reason is I believe very fervently that the right layer at which to solve the security problem is the data layer. I was in the infrastructure security world for a long time and that layer, protecting machines, you’re not close enough to the data, you don’t know what is important to the business.

[ Related: Informatica CEO: ‘Data security is an unsolved problem’ ]

Companies have made a lot of investments in infrastructure security and yet they’re still not secure, there are tons of breaches. I believe the fundamental answer, the right way to do security is at the data layer. That’s the layer you control whether it’s on premise or in the cloud and that’s the layer the hackers are after. They don’t care about hacking into your machine. They want your data. Therefore, security needs to go to the data level. That’s what we’ve been doing. We’ve been investing in that. We just got 11 awards at the RSA Conference for our data security products. Coming to Informatica helped me kill two birds with one stone. You were quoted as saying that data security is ‘an unsolved problem’. I wonder if you could explore that a little bit more and then talk a bit about the launch of Secure@Source.

Chakravarthy: If it was a solved problem then breaches would not be happening at the rate they are. I’m on a board myself. Virtually every board has these reviews of cybersecurity, data security, etc. Most board members have no idea or understanding of what’s going on. They do these interviews and they’re trying to stay awake for the most part. It’s technical and it’s hard. My view is that it’s an unsolved problem primarily because a lot of investment has gone into network and infrastructure security which worked great at the time when companies had clear boundaries, there was a clear perimeter, there was a clear data center and the amount of data that was outside the company’s control was extremely limited.

Now, as you know, the perimeter is not there at all anymore and most companies are jumping quickly to the cloud. Virtually any company is using tens if not hundreds of cloud-based applications, which all have their own data. We believe that the first step to [security] is you need to know where the sensitive data is. It’s not a trick question, but I ask virtually every CIO I meet: How many databases do you have? It seems like such a simple question and you can’t get a straight answer. You can tell me how many routers you have. You can tell me how many laptops you have. You can’t tell me how many databases you have? How will you know where you have sensitive data if you can’t tell me how many databases you have? That’s what I mean by unsolved problem.

We do believe that the first step to solve the problem is to get a real-time, clear view of where the sensitive data resides and who within your organization has access to it and controls it. That’s step one and that’s what Secure@Source solves. That’s the problem Secure@Source solves. It’s almost like it gives you that map. It gives you internal Google for telling you where your sensitive data resides and then you’ve got to prioritize who should have access to it, how to secure it, etc., that follows from having that view. Could you talk a little more about your strategy for helping customers with their big data initiatives?

Chakravarthy: We view big data as a new platform for both storing and processing data. Hadoop, for example, it does both. We have provided cloud-native capabilities for the new cloud platforms, we are providing the same types of capabilities as big data native capabilities running on Hadoop, running on NoSQL, etc. In other words, you have data integration, data quality, master data management, data security, all running in big data native technologies like Spark, for example, on Hadoop.

[ Related: Informatica Adds Support for ‘big data,’ Hadoop ]

The real benefit to customers is we also offer the translation. In other words, let’s say you used our technology to build a data pipeline or a data supply chain 15 years ago on traditional systems. Let’s say you built the data but planned to take it off mainframe systems and move it to a Unix-based data warehouse and you were using all of that. We would take all that business logic and run that natively on big data without you having to rewrite all that logic. That’s the real advantage that we provide, being able to reuse business logic, being able to reuse the skill set that customers already have with our products, but the underlying platform that they get is a big data native platform. Who do you view as your top competitors?

Chakravarthy: The competitive landscape for us consists of three types of companies. You have very large companies for whom data management is one of the product lines in their portfolio. These will be companies like IBM, for example, that have data management. It’s not necessarily their core focus area because they are so large but they do have data management in their portfolio. We have companies which are competitive to us that are specifically either providing a piece of the functionality or are focusing on a specific platform.

We have companies that we compete with like Hadoop, for example, in the big data space. We compete with companies in cloud, for example, that focus on one specific ecosystem or one specific platform and offer a subset of the functionality that we provide for that platform. The third set of companies [focuses on] new vertical applications that get built out. Somebody might, for example, build out an entire application for the life sciences industry and they will provide data management as a subset of that. That’s obviously not 100 percent overlap with what we do end-to-end, but they do manage data for the application they are providing. In a discussion you had with one of the reporters for our IDG News Service, you talked about Dell Boomi as the biggest competitor in the cloud space. Is that where you’re positioning them as a competitor, only in the cloud piece?

Chakravarthy: That’s correct. They are in the second bucket they focus on the cloud and we compete with them for cloud integration. We provide cloud data management, like master data management for the cloud, data quality, data security, etc. and they do not provide that. But in the cloud integration space we compete with them. I want to dig in a little bit more on them since I happened to have the same discussion recently with Chris McNabb over at Dell Boomi. What do you see as your key differentiators with them?

Chakravarthy: First, are we are truly hybrid. If you are solving data management for a big problem like digital transformation, it’s not enough to be cloud-only. You want to be truly hybrid; on-premise, cloud, big data platforms, any combination of the above. That’s what you need if you are an enterprise company looking for a data management partner. They solve a small piece of that puzzle. If a customer went with them they would have to buy similar tools for the other pieces of their puzzle and then the customer becomes the systems integrator putting all that together. That’s the big differentiator for us when a customer is looking at an enterprise provider for data management.

The second big differentiator is the portfolio of what we offer in the cloud. They provide cloud integration and as I just mentioned, even within the cloud, just take one ecosystem that they’re working with like Salesforce and Amazon – do you want different tools for integration and data quality and data security and so on or do you want to have an integrated set of tools for data management for that ecosystem? What’s ahead for Informatica?

Chakravarthy: We are super excited about the opportunity. Our target market is CIOs and digital transformation is top of mind for them. I think a lot of CIOs are still connecting the dots on what is required for digital transformation. We believe the CIO will become the chief digital officer. Increasingly, a lot of the things that the CIOs were doing to keep the lights on, running data centers and running infrastructure, etc., those will go to the cloud in some form or fashion or become commoditized enough that they will not be the ones doing it. Increasingly, their focus will be on the digital and the data aspects and we are well positioned to serve them for that.