3 Steps to Delivering Quality Data and How AI Helps

BrandPost By Manish Jain
Oct 28, 2021
Artificial IntelligenceData Quality

An enterprise data catalog is key to enabling timely, self-service access to data.

istock 1218965825
Credit: onurdongel

When you hear the words “data quality,” do you think about data cleansing tasks such as deduplication or reconciliation? While traditional data hygiene still has value, data quality today means much more.

Accurate data, of course, is essential. But you also need to provide the data that users are looking for, make it accessible to the right data consumers, and make it available when it’s needed. Those requirements may seem daunting, but they are achievable.

There are three steps to creating and delivering quality data:

  1. Data ingestion: onboarding data from different sources and transforming it to make it readable, searchable, and curatable
  2. Data curation: cataloguing and auto-tagging the data, as well as assuring compliance with privacy regulations
  3. Data publishing: contextualizing the data through semantic enrichment that adds information to make the data relevant for self-service publishing by data users

An AI-enabled enterprise data catalog will help with each of these steps by enabling the creation of a data fabric, which abstracts the physical location of the data, whether in any of several clouds or on-premises. A data fabric gives users a single view of data, allowing them to request – and receive – the data they need no matter where it resides.

AI-based data fingerprinting automates data access and enables self-service. First, you develop a fingerprint for all the data you are looking for, then create a tag for the fingerprint. A user applies the tag to search not only for data but also for the underlying information associated with it. By using tags with an enterprise data catalog, data scientists can automatically run jobs to capture the information they need. The result is a system that provides data on a self-service basis to those who need it, when they need it.

An AI-enabled enterprise data catalog also tracks data lineage, which is important for determining whether data contains personally identifiable information (PII) or personal health information (PHI). Protecting such sensitive data is critical to compliance with laws such as the European Union’s General Data Protection Regulation (GDPR), and the California Consumer Privacy Act (CCPA), not to mention the Health Insurance Portability and Accountability Act (HIPAA) for U. S. healthcare organizations.

Tracking data lineage can enable you to gain analytical insights from data while protecting personal data. For example, you can mask PII in a data lake through tokenization. By substituting similar values, or tokens, for real personal data, analytical processes can be performed while keeping the actual data from being exposed.

Data quality today is a demanding discipline that far exceeds legacy data-scrubbing practices. Hitachi Vantara’s Lumada data management software includes an enterprise data catalog that performs all the tasks I have discussed. It puts relevant, actionable data in the hands of those who need it, when they need it. Result: An agile business guided by informed and timely decisions.

For more information, visit: https://www.hitachivantara.com/intelligent-dataops