by Kumar Srivastava

To be AI-first, move beyond managing data warehouses

Oct 24, 2017
Artificial IntelligenceData WarehousingIT Leadership

Enterprises need to define a path to data-maturity and that requires reshaping and reorganizing their data storage, description, maintenance and value generation processes, procedures, tooling and functions.

Success with AI is heavily influenced by the data maturity of an organization i.e. their ability to procure, clean, curate, store and analyze data to power value generating applications. A data-mature enterprise know what data it has, knows what the data means and can ensure that the data is accessible to whoever needs it.

Unfortunately, the past few years, driven by the big data hype, have encouraged enterprises to focus on updating their data infrastructure to leverage new big data technologies. With a lot more data now available, enterprises already stuck with massive data storage costs, are being forced to choose between storing data that might eventually be useful for stabilizing, if not reducing their storage costs.

Cost-driven data strategy

The problem with taking a cost-driven strategy to data infrastructure modernization is that it ultimately misses the whole point of storing data in the first place. A cost-driven data strategy almost always leads to an enterprise maintaining as-is their decade-old data management procedures, processes, and tools that now leverage modern, distributed, often in-memory storage solutions. This cost-driven data strategy has two major drawbacks

Missed value from modern data technologies

Sticking with procedures, processes, and tools from a previous generation technology and adapting them for use with the latest infrastructure and platforms almost always leads to enterprises missing the tangible and real benefits of modern architecture. Maintaining processes and tools bounds the new value that can be extracted from the upgrade and can often cost a lot more in resources spent in adapting older processes, organizational structure, and tooling to work with the updated platforms and infrastructure. In addition to the upgrade costs, enterprises pay an enhanced operational and maintenance cost with no net increase in business value.

Status quo driven inefficiencies

Often, a cost-driven data strategy can get mired in maintaining the organizational status quo. In the previous generation of data infrastructure, specialists in ETL, data management, database administration, report builders etc. were required to deliver value. Modern data technologies, especially through NoSQL, in-memory approaches or managed data warehouses, remove the need for entire teams and functions offering not only cost benefits but also faster onboarding and easier generation of value. In lieu of that, even with modern technologies, enterprises can end up replicating previous functional roles that are not required any longer.

Data strategy that prepares for the AI era

A key requirement to being AI-first is data maturity. For enterprises looking to become AI-first, it is mandatory that they take a long, hard look at their data infrastructure operations and team organization and determine whether continuing down the same path even after upgrading to modern technologies makes sense. Ideally, enterprises need to and should focus all investments and effort on generating user and business value through application modernization and development and offload non-revenue generating activities such as infrastructure and platform management and development to technology vendors proficient in delivering these capabilities as service.

Data PaaS

There are several data technology vendors such as AWS RedShift, Google BigQuery, Panoply etc, that offer modern data infrastructure as a PaaS making it extremely easy to not only upgrade but match, if not exceed, the performance of custom data infrastructure at a fraction of the maintenance and operational cost. Panoply, for instance, not only manages the infrastructure but also promises to eliminate data engineering work. It offers a smart data warehouse that automatically optimizes and transforms data to user requirements, using machine learning. Similarly, Google BigQuery offers a managed service focused on reducing the operational footprint. Amazon RedShift has a similar value prop with deep focus on migration from legacy data warehouses to both hybrid and cloud only deployments. At the same time, selecting such a partner requires the understanding of application development and application maintenance costs on any PaaS provider. Ultimately, it is the ability of the enterprise to convert data into value iteratively and constantly that differentiates it from its competitors and makes it agile and innovative. Modern data infrastructure, custom built or leveraged as a PaaS should reduce the cost of continuous app development and improvement and deliver high degree of efficiencies in bringing new ideas to market. If a modern PaaS does not provide this value, it is really just shifting the cost, not really reducing it.

Beyond storage to organization

Having the ability to store data cheaply is only the first step in achieving data maturity. A key reason for the low ROI from the big data efforts of the past few years has been the lack of description and utility information of the data that is stored in the data infrastructure. This lack can be a severe debilitating factor in the value extraction from data. The ability to understand and describe the data is key to building data maturity. There are several techniques to building this capability ranging from crowdsourcing metadata to using machine learning to generate the metadata that describes the data. Regardless of the chosen techniques, traditional database management or data warehouse design functions need to switch their focus to better describing data and governing its usage to ensure that the best data is used for the optimal purpose to maximize the value that can be extracted from the data.

To be AI-first, target data-maturity

AI-first is the ability to move to adding predictive intelligence to various enterprise applications and scenarios. Achieving AI-first requires several core competencies to be enhanced and developed but a gating factor to AI-first is data maturity. Enterprises need to define a path to data-maturity and that requires reshaping and reorganizing their data storage, description, maintenance and value generation processes, procedures, tooling and functions. The key to success likes in spending a majority of investment in utilizing the data and reducing the fraction of investment on the data storage, organization and description.