by Soumik Ghosh

Metadata can fix your dark data problems: Gerard Francis, Bloomberg

Interview
May 10, 2017
AnalyticsBudgetingBusiness

Gerard Francis, the man at the helm of Bloomberg’s global enterprise solutions, sheds light upon why metadata is the new Holy Grail in the enterprise today.

At a time when the enterprise is grappling with immense volumes of dark data, the key to finding the right insights lies in tapping into the metadata.

A dialogue with Gerard Francis, global head, Bloomberg Enterprise Solutions, throws light on how metadata is the answer to battling cost challenges posed by dark data.

How do CIOs zero in on the right vendor to handle their data?

We’ve always had really great underlying high-quality data on the terminal, simply because we’re so widely used by traders and portfolio managers around the world. And they make big investment decisions based on our real-time news and data.

We lay a very high focus on making sure that the data quality matches the metadata, and that ensures that there are very few mismatches for our clients. We actually track beyond the 99.9 percent accuracy on our metadata.

Gerard Francis, Global Head, Bloomberg Enterprise Solutions

We broaden that in the context of enterprise data because we don’t just focus on the underlying data, we focus on giving people very accurate metadata. The metadata is extremely important, as that’s ultimately what programmers use. That’s how data is ingested into the organization.

In terms of CIOs and CTOs, what we see happening across the world is in the past, as data quality from different vendors wasn’t that high, people opted for a strategy where they had many vendors. We lay a very high focus on making sure that the data quality matches the metadata, and that ensures that there are very few mismatches for our clients. We actually track beyond the 99.9 percent accuracy on our metadata.

Now, as data quality has gotten better, CIOs feel that they do not need as many vendors anymore. With fewer vendors, they have to do lesser reconciliation across datasets, and you end up with fewer mismatches.

So, the vendor you ought to pick is the one that’s really accurate with your data.

What’s your takeaway for CIOs to ensure data quality is maintained at the highest levels?

It all comes down to the way you use your metadata, and how you program against the metadata. Metadata hasn’t always got as much focus when people talk about data.

Now when you see data managers across the world, they’re focusing very intently on the quality, accuracy, and describability of the metadata.

What are the current challenges with respect to Governance, Risk, and Compliance (GRC)?

I think currently there’s a much larger emphasis on governance. For the first time, we see Chief Data Officers being appointed across many organizations, and their primary role is describing data policies and governance around the data policies.

And the chief purpose of that is both for their own internal purposes – making sure that when data is used in applications, there’s a reason why that data is there. This is also very important for regulators, as they don’t feel comfortable where an organization doesn’t have the right data governance policies. Regulators are less likely to trust such organizations with their analyses.

A Bloomberg survey reveals that 38 percent of organizations say legacy systems do not meet their requirements, and that budgetary constraints are a major problem. What’s your take on this?

People are very cost-sensitive, and that’s a big driver. They know they need to upgrade their technology, and that’s a challenge as technology comes at a high cost initially.

There are two things people need to do – one is to ensure that at the end stage, they’re actually lowering their overall operating costs, and that’s not just the money they spend on the vendor, but the money that’s actually spent internally. And then based on that, one has to design strategies that manage any overlapping costs. But as long as they’re focused on lowering overall costs, in the end, they normally end up in the right place.

The second thing they need to manage is the risk of operating. They need to ask themselves if they’re partnering with a high-quality vendor that actually helps them lower operating risks.

But ultimately, firms have to make bold decisions. If you continue running many multiple duplicated systems, you’ll never really get big savings. This can also help organizations overcome the challenge posed by legacy systems.

Numerous studies have shown the added cost burden posed by dark data. What’s Bloomberg’s solution to minimize the costs incurred by hoarding data?

Organizations do hoard a lot of dark data, some of which they do not even understand. Some of this data is valuable, but some can expose the organization to risks.

Bloomberg has invested in a very innovative solution called Bloomberg Vault which allows our clients to index all of their internal data and then be able to zero in on where their dark data sets. It gives them metadata around it and allows them to identify duplicated data.

It’s the metadata that actually makes the data readable to the machine. If you have good metadata, your data can be read by the machine. Without it, it becomes very hard for your machine to understand and derive insights.

Gerard Francis, Global Head, Bloomberg Enterprise Solutions

Bloomberg Vault is a secure, hosted compliance solution that enables governance of communication data-based information, file analytics and trade reconstruction across the enterprise. It allows clients to run a very powerful compliance rules engines for their emails, chats, and social media platforms. We at Bloomberg, strongly believe that dark data management is the next frontier of information governance.

A lot of organizations nowadays are deploying artificial intelligence and machine learning. What are the data management challenges organizations are faced with in these cases?

If I’m building a deep-learning system, I need to have the tools that I can apply to the data. But just because I have the tools, it doesn’t mean I’m going to get anything valuable out of it. The machine needs to be able to ingest lots of data and make sense out of it.

It’s the metadata that actually makes the data readable to the machine. If you have good metadata, your data can be read by the machine. Without it, it becomes very hard for your machine to understand and derive insights.

The second most important factor is tagging. When you get data, especially unstructured data, you need to tag that well for the machine to understand it.

We have a very advanced tagging process that makes the data readable to the machine. So, machine learning is a very important focus area for us as the world moves towards adopting a cognitive approach.