The unified, grand theory of metadata governance – and why IoT success hinges on it
Finding value in IoT generated data will hinge on the enterprise’s ability to define, capture and curate a metadata fabric that enables efficient, fast and accurate data analysis and data driven decisions. The metadata fabric becomes the core of the strategy to use IoT delivered value for the benefit of the enterprise and its customers.The metadata fabric is the machine and human readable translation of the business intellectual property of an enterprise that enables the enterprise employees to quickly apply their business lens to data collected from the IoT and generate insights that are relevant and critical.
The metadata fabric delivers a single, unified interpretation of data and analytics into business consumable format and interface.
In complex environments, such as an IoT ecosystem, the number of data producers far exceeds the number of processes consuming data. Typically, the number of unique sets of data collected (that end up as mutually exclusive data sets) is much smaller than the number of applications that need to consume the data. This leads to an interesting governance problem where sometimes, it can make sense to introduce an intermediate curation layer that attempts to reduce the friction and onboarding cost for both the data producers (to register and deliver data) and the data consumers (to read and process data).
The goal of the governance system should be to optimize the process to drive homogeneity in the format of the collected data with the overall goal of reduction in onboarding cost and increase in the likelihood of onboarding success. This is needed to scale out data collection from millions of data producers.
The second goal of the governance system should be enable easy search and discovery of relevant data sets AND relevant subsets within a data set. In essence, this becomes a search problem where the fastest identification of the required records to service an analytical application requires both recall (search through the largest set of accessible records) and precision (identification of records and relevant attributes required to service an analytical app).
It is in the interest of governance system to facilitate a great search experience that learns from its producers and consumers and generates a metadata fabric that facilitates diverse data analysis and promotes easy data creation.
Competitive advantage derived from data hinges on the presence of a living, healthy metadata fabric that is designed to reduce the friction across all stages of data governance; from the creation of the data, delivery of the data, selection of the data, transformation of the data to the analysis of data and the consumption of insights. This metadata layer takes on several shapes and comes in a variety of sizes. The goal of the metadata layer is to capture and incorporate the business context, logic, models and rules as machine readable, programmable concepts that can be used to mimic how humans process data, analytics and information.
The most common and important types of metadata are:
Maps act exactly as maps should. They map data elements to business understandable and relevant terms. This can be as simple as a 1:1 map from a payload attribute to a business friendly term and can range to additional information about the payload and attribute that makes the data more usable and understandable such as type information, lineage, provenance, acceptable use, applied filters, transformations or referential connections.
Derivations are the results of functions applied to data sets and their attributes that produce higher level data sets, metrics or dimensions. A filtered or aggregated or sampled or split version of a data set is an example of a data set level derivation. A metric that is bucketed, binned, filtered, aggregated is an example of a derived metric or derived dimension. A dimension that is labeled, renamed, split, transformed, bucketed or binned is an example of a derived dimension. Derivations can be simple or complex. In the complex case, multiple metrics and dimensional attributes might be combined to produce a new metric or dimension in complex logical expressions.
Complex events are a derivation of raw records in a data set where the presence of specific factors across multiple records of potentially different business relevance or context (can be translated to mean records of different types) is referred to an event i.e. multiple different types of data producers produce disparate records of data that are correlatable over universal dimensions like time or location or other categories such as channels etc by being included in the definition of a new type of complex event. Complex events can be further processed similar to raw events.
Enterprises should focus on building a robust metadata fabric that the entire enterprise can consume and contribute. There are several types of challenges that can make this hard to achieve.
The number one problem to building a metadata fabric is the ability to create and store it in a business friendly interface. In addition to being able to create metadata, the system needs to enable the exploration of metadata, the use of metadata to enable data analysis and the edit/update of metadata to keep up with changing business context, logic and models. In addition, the metadata needs to be accessible, verifiable and traceable to ensure that it is of the highest quality and that imperfections in logic of models do not percolate into the basis of all downstream analysis.
Maintaining a metadata fabric
The maintenance of the metadata fabric is mandated by three key factors:
The arriving data can change how, when, where and what it delivers i.e. anything from the schema to the ranges of values of various attributes to the definitions of acceptable values can change over time. In addition, the data can start arriving in a fragmented manner or a non sequential manner or can be encrypted etc. Becauses such change can be unexpected (when the data is arriving from a system that is vendor managed or third party or partner) and unavoidable, the metadata fabric needs to be able to adapt to this change
As organizations change, grow, shrinks, personnel changes are unavoidable. This means that the the metadata layer needs to learn from what employees do with it, how they access the metadata and consume the data. This information itself needs to become part of the metadata layer, available for successive employees to consume and learn from.
In addition, personnel changes (and other changes) also lead to changing or differing perceptions of the data. This forces the problem of schizophrenia on the data which by itself, is not necessarily a bad thing. What is needed is the ability for the metadata layer to capture all perceptions and opinions and offer them to the data consumer (human or application) and enable them to determine the best use of the data.
As business change and business models are adapted to changing realities of markets, users or competition or new business models are created, the rules, logic and models change and this requires the metadata layer to be flexible and resilient to such change. The metadata fabric should be able to adapt to this change while ensuring that consuming applications and producing systems are not impacted.
Consuming a data fabric
A metadata fabric can and should be consumed by both systems and humans that are either designing the systems or consuming the information delivered from these systems. Typical challenges arrive when the metadata fabric is not accessible at the right time, from the right place in the right form. A lacking interface can either misdeliver or lead to misinterpretation of data that can cause flawed system development, flawed analytics and flawed decisions. Care needs to be taken to ensure that consumers have full visibility and insight into the metadata fabric at all points in the information creation and delivery workflow.
Three key signs of a broken metadata fabric are:
If your users and systems are not able to find, search through or use the metadata in the metadata fabric, it is probably broken. If users and systems are not able to update the metadata, it is probably broken.
Data exists in disconnected islands
If your data exists in disconnected islands and the gaps are not being bridged during data data discovery and curation or during data analysis and insight delivery by the consumption of the metadata fabric, it is probably broken and does not offer enough value.
If your metadata is out of date and stale because either users and systems cannot update it and maintain freshness or because users do not have any motivation to keep it up to date (which can often be a sign of broken analytics and a suboptimal decision making system), it can be a sign of a broken metadata fabric.
A well designed, interactive, fresh and comprehensive metadata fabric can enable exponentially greater results to be delivered from even higher value analytics and insights. Building a data analysis system that has a strong metadata fabric component is important and ensuring that this metadata fabric is consumed and incorporated into analytics design and consumption is key to deriving the most value from your data. In IoT ecosystems that have increased exponentially the variety and volume of data available for analysis, the absence of a metadata fabric in an organization can be prohibitively expensive. Enterprises need to ensure that building such a fabric is core to their IoT and big data plans.