Data is growing in size, with\u00a0IDC predicting 163 Zettabytes of data by 2025, and also in complexity. The rise of open-source software, DevOps, and NoSQL databases has created a new generation of developers empowered to choose the right tool for the job across an ever more heterogeneous technology stack. Companies are not just migrating to\u00a0the cloud, but are embracing multiple public and private clouds \u2014 choosing carefully where these applications run in order to best utilize those cloud services.\nThis landscape creates new challenges for data security. Successful security relies on a thorough understanding of our systems, and a clear notion of potential attack vectors from adversaries both known and unknown. As Donald Rumsfeld, the U.S. Secretary of Defense once put it during a\u00a0press conference\u00a0in the months leading up to the U.S. invasion of Iraq:\n\u201cAs we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns \u2013 the ones we don't know we don't know.\u201d\nSensitive data is increasingly an \u201cunknown unknown.\u201d As new data is created continuously, and the structure and form of data constantly evolves, it becomes difficult to even know where to look, let alone how to fully understand and manage the risk that data creates throughout a company.\nIt\u2019s like looking at the tip of an iceberg: while we may be confident in our ability to protect what we can see from unknown assailants, there\u2019s a massive amount of invisible risk lurking beneath the surface. For example, even when you do manage to secure critical production data, that data is often copied and moved to other environments for development, testing, analytics, and more.\nTraditionally, this has been a problem of centralization. Using classic techniques \u2014 such standardizing on a single data platform, or forcing all changes through a single architectural and security review process \u2014 is increasingly difficult in today\u2019s world. As data spans more places and formats it gets fragmented. And so, do the teams of people who consume and manage it.\nThis fragmentation is critical for enabling speed and agility in the business, but it creates significant\u00a0data friction\u00a0when it comes to identifying and understanding data risk within the business.\nIdentifying and tracking assets is well-understood practice, but data friction demands a new approach, one that can complement human systems by leveraging software to identify and qualify risk in data across the enterprise. The basic elements are straightforward:\n\nEstablish a platform that can connect to your critical systems.\nUse data profiling techniques to identify sensitive information such as names, addresses, and social security numbers.\nRigorously mitigate risk identified by these processes.\n\nBut even this approach has limits. First, it requires that you know how to connect to the relevant information, and requires someone to know which types of information present the highest security risk.\nIt sounds like a job for machine learning and artificial intelligence, and some cloud service providers already offer it.\nAmazon Macie\u00a0is an AWS service that uses machine learning to automatically discover, classify and protect sensitive data like PII and intellectual property data stored in Amazon S3. It also offers a dashboard for tracking how the data is accessed, used and moved.\nGoogle Cloud provides a similar API for applications to use, known as the\u00a0Data Loss Prevention\u00a0API. Not only does it provide the ability to search text for sensitive information, it can process images and redact information automatically.\nBut these are just early examples how data security practices must evolve. We must move beyond just unstructured data in a single cloud, towards structured and unstructured data across increasingly heterogeneous environments.\nAnd as we\u2019ll explore later in this series, simply identifying data is not enough. We need to understand how systems connect to data workflows in the enterprise. Are they production sources? Non-production sources? How are copies of that data made? Where are they going and where have they been?\nDataOps practices help solve these problems by bringing together the people and technology responsible for creating and changing data sources, and understanding how that data flows throughout the enterprise. By leveraging software to complement process, you can free up human capital to invest in building the right collaboration and visibility into the data pipeline.\nTechniques such as machine learning will shine lights on things that no human can detect. And as we\u2019ll see later in this series, addressing this from a systemic data-centric point-of-view is the only way to adequately identify risk across a complex enterprise.\nRead more about Delphix.