It’s not your data. It’s how you use it. Unlock the power of data & build foundations of a data driven organisation

BrandPost By Cloudera
Jul 29, 2021
Big DataData IntegrationData Visualization

big data analytics analysis thinkstock 673266772 100749739 orig
Credit: Huawei

Data has always been fundamental to business, but as organisations continue to move to Cloud based environments coupled with advances in technology like streaming and real-time analytics, building a data driven business is one of the keys to success.

There are many attributes a data-driven organisation possesses. Deloitte lists these as:

  • Creating and shaping a common data foundation.
  • Defining and using single data points for multiple purposes.
  • Building a semantic layer describing unified business and reporting definitions.
  • Unlocking the value of data with in-depth advanced analytics, focusing on providing drill-through business insights.
  • Providing a platform for fact-based and actionable management reporting, algorithmic forecasting and digital dashboarding.

Australian research and advisory firm Adapt identifies an organisation’s ability to execute a data-driven strategy as one of 12 core competencies, identified from 30,000 conversations spanning three years with leading IT and businesses.

IBM’s Global C-suite Study, 2021 agrees, saying there is strong evidence that data-driven organisations outperform their peers financially, on innovation and in driving cultural change. They are also 91 percent more likely to be trusted by customers.

But there are many challenges to becoming a successful data-driven organisation. Organisations have to contend with legacy  data and increasing volumes of data spread across multiple silos. They have to effectively ingest, store and manage the huge volumes of ‘new’ data generated in a hyper-connected environment, and they have to be able to apply data analytics to extract real value from this data, in near-real time  while ensuring it is kept secure and in compliance with governance requirements.

To meet these demands many IT teams find themselves being systems integrators, having to find ways to access and manipulate large volumes of data for multiple business functions and use cases. It is not enough to move some workloads to the cloud. Without a clear data strategy that’s aligned to their business requirements, being truly data-driven will be a challenge.

This is the first post in a series of three on data-driven organisations. The second will focus on the growth in volume and type of data required to be stored and managed, and the ways in which value can be extracted from data. The third will examine the challenges of realising that value, the attributes of a successful data-driven organisation, and the benefits that can be gained.


According to an IDG MarketPulse survey, organisations’ data volumes are growing by 63 percent per month, on average, and at 100 percent or more per month in 10 percent of organisations. Today transactional data, which includes streaming data and data flows, is the largest contributor to these data volumes.

The survey found the mean number of data sources per organisation to be 400, and more than 20 percent of companies surveyed to be drawing from 1,000 or more data sources to feed business intelligence and analytics systems.

It also revealed that only 37 percent of organisational data being stored in cloud data warehouses, and 35 percent still in on-premises data warehouses. However, more than 99 percent of respondents said they would migrate data to the cloud over the next two years.

The Internet of Things (IoT) is a huge contributor of data to this growing volume, iotaComm estimates there are 35 billion IoT devices worldwide and that in 2025 all IoT devices combined will generate 79.4 zettabytes of data. Today transactional data is the largest segment, which includes streaming and data flows.


One of the biggest challenges presented by having massive volumes of disparate unstructured data is extracting useable information and insights. Data analytics, applied effectively, can provide extremely valuable guidance to identify trends and inform business decision making, but the data has to be accessible to these data analytics tools if they are to deliver actionable insights.

Also, there is an increasing need for near real-time analysis to support decision making using machine learning and artificial intelligence, which demands near real-time ingesting and processing of data.

These challenges can be summarised as follows.

  • Ensuring all relevant data needed for decision support is collected and made available for analysis.
  • Ensuring that all data feeding analysis is accurate, and complete (a significant omission can seriously skew the results of any analysis).
  • Pressure to deliver results and insights from analysis that may be beyond the scope of what the available data can provide.
  • Reliance on human intervention to provide the data required for analysis.
  • Having systems able to scale to handle the volumes of data to be analysed.


The foundation that enables an organisation to display all these attributes has traditionally been an effective data warehouse. However, this concept has evolved in line with the increasing demands of mature and sophisticated data-driven organisations, and with the increased use and sophistication of cloud computing services.

451 Research says it has identified the emergence of a new product category in the analytics sector: the Enterprise Intelligence Platform, that “combines data integration, data storage and processing, and analytics functionality in a single offering designed to meet the needs of both data operators and data consumers.”

It argues that enterprises need to adopt a three-step process that has traditionally required three distinct products (historically from three separate vendors) to execute analytics effectively and to:

  • ingest and integrate data from enterprise applications, typically using extract, transform and load (ETL) tools.
  • store and process the data, typically in a data warehouse, where the data is modelled and schema applied.
  • analyse the data, using business intelligence, visualisation or data science tools.

An example of a modern unified data management technology is the Cloudera Data Platform (CDP). It supports data-driven decision making by easily, quickly, and safely connecting the entire data lifecycle within a secure environment.

It addresses the challenges organisations increasingly face in managing and extracting maximum value from their data by ensuring sufficient real-time processing capacity for large data volumes, facilitating self-service analytics for more cross-functional collaboration and enabling organisations to scale up or scale down workloads accordingly.

CDP is the industry’s first enterprise data cloud. It enables organisations to manage, analyse and experiment with data across hybrid and multi-cloud environments for faster business insights. It applies real-time stream processing, data warehousing, data science and iterative machine learning across shared data to support the most complex business use cases. At the same time, it enables organisations to comply with data privacy and compliance requirements with a common security model spanning public, private and hybrid cloud.


Organisations across various industries have benefited from faster, data-driven business decisions since implementing CDP in their organisations. Here are some real-world examples of how CDP helps solve real data challenges.

Pharmaceutical research

Life science organisations gather and analyse data from multiple and diverse sources and apply machine learning in their search for new treatments. These sources can include: data from labs and clinical trials, doctors notes, prescriptions, MRI scans and surgeries. Much of this is highly sensitive personal data and is subject to strict regulations covering privacy and security.

One pharmaceutical company deployed CDP in combination with its own artificial intelligence technology to increase the speed and quality of its drug discovery and vaccine pipeline, accelerating safe medicine delivery to the market. In one instance, time required for analysis was reduced from 80 years to a few weeks. Furthermore, all research data was made more easily available to a wider group of researchers, giving scientists the capability to deep dive on pharma analytics. 


A global insurance company used CDP to deliver machine learning, creating a consistent user experience for self-service analytics while scaling to any type of workload. Cloudera’s machine learning operations capabilities allowed the company to automate the deployment, monitoring, and management of machine learning models into production in a scalable and governed way. All this is run in a secure environment with centralised data governance across on-premise and public cloud, safeguarding the personal data of over 10 million customers.

Besides being able to handle far bigger computing workloads, while keeping costs down, the company has cut costs and built an “AI factory” that can be used by all teams. New data scientists can then be onboarded more easily and efficiently.

Oil and Gas

A multinational oil and gas corporation wanted to build a manufacturing data lake to hold refinery, historical and sensor data and gain a holistic view of its operations. This data lake was meant to support its log analytics application used to ingest data from multiple environments and generate real-time alerts on events throughout the organisation. However, data was being generated at a rate greater than relational databases could handle and the initial data lake was built for only one application. The company needed to reduce costs by moving some data into a less costly data lake for storage while avoiding vendor lock-in. It also needed a data flow pipeline to collect, process and distribute data across applications. In addition, the sensitivity of customer data handled by the company warrants a need to keep their operational data set secure.

By deploying CDP Public Cloud in a hybrid, multi-cloud environment the company was able to ingest log data from 130,000 PCs located around the world and across platforms in real-time to provide unified data downstream used by a multitude of analytics applications. The company realised a 55 per cent increase in search performance, $2 million license cost reduction over five years and 30% reduced infrastructure cost.  A critical result of the project is the heightened response time to detect cybersecurity threats, bringing it down from 70 minutes to seven minutes.

Find out more about Cloudera Data Platform here. 

Register for The Foundations of a Modern Data-Driven Organisation webinar with Cloudera Field CTO, Daniel Hand