For decades, data scientists (n\u00e9e statisticians) have had sandboxes to explore data and find valuable insights.\u00a0 In what seemed like a happy compromise, analysts could quickly load, manipulate, and combine enterprise and industry data in search of new insights and predictions without worry that they would compromise sensitive data or production workflows.\u00a0 While this accelerated creating new insights, putting them into production was a nightmare.\u00a0 A bevy of custom code and data created in an ungoverned environment needed to be converted, quality controlled, and optimized before deployment.\u00a0 It often took the better part of a year for a business to get value from an insight gleaned in a few weeks.\u00a0\nThe specter of big data threatened to make the situation worse\u2014in a big way. Now analysts were using data structures and programming languages foreign to IT. The volume and complexity of external data sources were exploding.\u00a0 Without a new approach, insights found in a big data sandbox might never make it into production.\u00a0\nWhat has emerged is a new paradigm that brings data governance\u2014a term that is anathema to most analysts\u2014to big data.\u00a0 But instead of heavy-handed restrictions on data usage and documentation, big data governance is agile, collaborative, and efficient. It engages, not separates, analysts in capturing their learnings to accelerate production readiness.\u00a0 Most important, it replaces the massive conversion of sandbox data with a \u201cpromotion\u201d process that ensures that analytics data is made production-ready on the big data platform.\u00a0\nBig data governance requires us to rethink governance from the ground up.\u00a0 Instead of physically separating sandbox and production data, big data governance logically controls access and usage as data matures from \u201craw\u201d to \u201cready.\u201d How can you tell if data is ready for production? Metadata. Any big data platform supporting production usage must have metadata tracking the lifecycle of data ingestion, validation, preparation, and use.\u00a0 The metadata needs to manage data access rights, capture data profiling results, and commentary by data developers and end users. Metadata stores the policies that define production readiness, and is able to enforce them. Without metadata, a data lake becomes a data swamp.\u00a0\nBut for this to be practical, metadata capture must be automated and relevant.\u00a0 A second tenet of big data governance contradicts current dogma: use schemas from the start to enrich metadata. Most business data is structured, whether it\u2019s relational, log files, XML, or mainframe copybooks.\u00a0 That structure can be used to automatically assess the quality, completeness, and content of raw data.\u00a0 This not only gives analysts insight into the data, it establishes a metadata foundation to build on.\nThe third principle of big data governance is scorecard-driven prioritization.\u00a0 Not all data needs strict governance over quality and access.\u00a0 In fact, the assumption is that most raw data loaded won\u2019t be used\u2014so enriching its metadata is a waste of time. Instead, scorecards are created for the various uses of the data \u2013 compliance reporting, marketing analytics, supply chain analysis, etc.\u00a0 Some policies apply to all scorecards \u2013 PII data needs to be masked \u2013 and others are very specific \u2013 data lineage is required for all compliance reports. With a metadata foundation, scorecards are easy to create for any data set.\u00a0 These scorecards are then used to identify and prioritize governance efforts to make the most important data production-ready.\u00a0\nWhere to start?\u00a0 If you have a data lake with poor metadata, I recommend starting with an assessment of the quality and content of your existing assets. Automated tools can populate a metadata repository as a foundation for creating scorecards.\u00a0 Making the content and quality of the lake transparent is the first step toward big data governance.