Time to Bring a Machine First™ approach for Enterprise Data Lakes on Cloud

BrandPost By Joydeep Samajder
Apr 17, 2020
Cloud ComputingIT Leadership

istock 1184804468
Credit: metamorworks /istock

It’s been 14 years since we started using the phrase “Cloud Computing” with Amazon releasing its Elastic Compute Cloud, and the buzz around cloud is louder than ever. Everyone considers cloud to be of utmost importance for their modernization journey.

Confidence in cloud comes from the experience and certainty gained from compelling business use cases and widespread participation of business stakeholders and cloud adoption at scale.  

With the explosion of data and the advent of data discovery and inference technologies (like analytics, IoT, machine learning, etc.), focus has shifted toward unearthing value from massive amounts of information – both from within the enterprise as well as external ecosystems and data sources, to generate competitive edge in the market. Customers across industries are now looking beyond “digital” to business use cases leveraging AI to get intelligent insights and incorporate these into day-to-day core business decision making.

Modern data estates are the need of the hour

Data-driven analytics and insights now have become pervasive across all aspects of business functions and growth. Customers are looking beyond traditional data warehouses and appliance-based systems to gear up for the new world needs of cognitive analytics and insights, data monetization, real-time processing, next-gen visualization, self-service, guided analytics, etc.

As a result, C-suite priorities have drastically transformed. Data has truly become the central character in the Business 4.0 economy. Enterprises across all industries and verticals realize it is data that creates the biggest core value asset and capacity for differentiation.

Unless one has created the right foundation for the data assets, it is impossible to derive the true value of the subject data. This is where a modernized data estate or an enterprise data lake is no longer an option but a requirement for any organization to derive real value out of data.

But unless you build it rapidly, and build it right, you lose out on the competitive edge

If you take a deeper look at data, particularly in context of the modern business, there are multiple layers of complexities resulting in increasing needs and demands for data. There are various stakeholders, multitude of sources, multitude of types, potential to derive exponential value, stringent regulatory and compliance requirements, and other considerations that may become highly complex when taken together. Many customers have found themselves setting forth a data lake initiative that ultimately turns out to be a data swamp or data graveyard.

In order to reduce the IT gap and gain competitive advantage, there are three things an enterprise should consider for a modern data estate:-

  1. It should be built with a rapid time to market and return on value
  2. It should be built where data is holistic, accurate, and as near real-time as possible
  3. It should encourage democratization of data with Data-as-a-Service

None of the above is possible without a Machine First approach, where automation takes the first role in all stages of the modernization journey. That is why pre-built and field-proven accelerators provide the best overall outcomes for these journeys.  “TCS DEER”, under the umbrella of TCS’ Data and Analytics Estate Modernization suite, TCS DAEzMo™, can be a key enabler in the creation of a modernized data estate.

TCS DEER providing a Machine First™ approach

 TCS DEER focuses on provisioning and processing data in the data lake. It is a high-performance data management tool with out-of-the-box capabilities for data ingestion, profiling, cleansing, transformation and publishing. This accelerator has been implemented for many customers across the globe and has been granted two patents.

When preparing a data lake using TCS DEER, data from source systems is extracted with the help of TCS DEER and placed in storage tiers such as S3 by creating respective buckets. This layer would represent the data ingestion layer for customer, where extracted raw data is stored. This unprocessed data is then pushed into the data management system for storage based on functional needs. The data is accessed for preparation and processing into a data analysis layer. This data analysis layer is ideally a cloud-based enterprise data warehouse (EDW) such as AWS Redshift. The data stored in the data analysis layer will be accessed and consumed for reporting and analytical purposes through a data connector layer, which provides a number of methods and interfaces for human and programmatic consumption of outputs.

Why TCS DEER?

In the world of big data and analytics, TCS DEER brings a Machine First™ approach from data ingestion to data archival covering data transformation/processing, semantic analytic layer, data migration, and data validation.

Below are some compelling points why one should choose TCS DEER for any modernization journey:

Rapid time to market: For a custom-built solution, the focus typically is on developing a framework rather than addressing the problem statement. On average, this takes a typical implementation cycle of 8 to 12 months, and would also have the normal challenge of obtaining resources with the right skill sets. TCS DEER can reduce this timeline by approximately 30%.

No need for specialized skill sets for custom framework development: Customers avoid the need to set up a dedicated team with specific skill sets to create a data ingestion/processing or preparation/transformation frameworks. TCS DEER provides a rich and user-friendly interface and does not require any specific skillset to operate.

Production readiness: Built-in connectors are available to extract data from multiple sources, including structured, semi-structured, and unstructured data, all at no additional cost to customers.

Automated schema adoption: Automated adoption of source schema changes and synchronization with the target data lake reduces manual intervention and operational requirements.

Job restorability: The ability to resume failed migration jobs from the point of failure reduces the execution time for migrating the data, while providing an efficient failure tracking and handling solution.

Security: Enterprise-grade security is built-in with capabilities including LDAP, Kerberos, encryption, masking, Sentry/Ranger, etc.

TCS DEER on AWS

TCS DEER may be leveraged on the AWS platform. Here is a short list of observed use cases:

  • Archival platform on AWS leveraging S3 as storage
    • Legacy application decommissioning
    • Ongoing archival requirements
  • Provisioning data lake on AWS
    • Enable various layers of data lake – raw layer, curated layer
    • Data provisioning layer – enablement/creation of semantic layers
  • Migration of EDW to Redshift
    • Data migration support – bulk, incremental
    • Data processing support – data type conversions
  • ILM on cloud storage
    • Regulatory requirements – legal hold/retention policies at record level

Customer stories

TCS DEER has been implemented for multiple customers across industry verticals and geographies. Here is a sample selection of successes delivered by TCS DEER on AWS:

A major UK media firm leveraged TCS DEER for modernizing their appliance-based data warehouse to Snowflake on AWS with a cloud-first strategy. Their key goals were built-in data validation for migrated data, automatic adoption to schema changes for incremental migration, increased flexibility for servicing varied workloads, rapid time to market, and on-demand deployment. With TCS DEER, time to market was greatly reduced while handily beating the required performance benchmark of extract/load of 1TB of data in less than 4 hours.

A North American technology provider leveraged TCS DEER on AWS for data extraction in line with decommissioning and regulatory compliance in a cost-effective manner with additional requirements for handling unstructured data.  This highly scalable architecture hosted on AWS reduced the overall infrastructure spend while processing 15 million unstructured files (fax, emails) and managing archival of approximately 2.5 million invoices. Migration of data from 40 legacy applications using TCS DEER resulted in savings of $10 million for the customer, also encompassing the $2 million project cost.

A leading financial institution in the Nordics, which provides tailored investment solutions across a variety of asset classes to retail and institutional clients, partnered with TCS for the digital transformation of their wealth management area – with a focus on extracting data from legacy systems, transforming data real time, publishing to downstream systems, and enabling visualization and analytics. TCS DEER on AWS was leveraged to speed up digitization to ingest and transform, coupled with efficient data governance required by corporate banking and due consideration for security and compliance needs. The initial results showed quicker analysis of customer portfolio behavior and enabled improved wealth management strategies leading to improved business results.

Conclusion

If you are looking to accelerate your modernized data estate or state-of-the-art data lake with holistic, accurate and real-time data, embrace the Machine-First approach with TCS DEER.